SlideShare a Scribd company logo
1 of 77
Download to read offline
Activity & research report
Marco Cagnazzo
Paris, September 2013
Overview
• Activity report
– Teaching and PhD supervision
– Projects and other activities
– Bibliometrics
• Research themes
– Video coding optimization
• Motion representation
• 3D video coding
– Adaptive image compression
– Distributed video coding
• Multiview DVC
• SI effectiveness evaluation
– Robust video streaming
• Streaming protocols
• Network coding
• Conclusions
2
Timeline
• 2002-2005 : PhD @ University of Naples & University
of Nice-Sophia Antipolis (cotutelle)
• 2005-2006 : Post-doc @ National Multimedia Lab
(Naples) & Assistant professor @ University of Naples
• 2006-2008 : Post-doc @ I3S Lab, Sophia Antipolis
• Since February 2008: Maître de conférences in Digital
Video @ Telecom-ParisTech
3
2002 2004 2006 2008 2010 2012 2014
Teaching
4
Name Institution Years
Information Theory “Parthenope”
University of Naples
2004-2005 Responsible
Multimedia signal
processing
“Federico II”
University of Naples
2005-2007 Responsible
Compression
techniques
Telecom-ParisTech 2008- … Co-responsible
Digital video and
multimedia
Telecom-ParisTech 2009- … Responsible
Digital television Telecom-ParisTech 2009- … Co-responsible, CE
Video over mobile Telecom-ParisTech 2009- … Co-responsible, CE
3D Video Telecom-ParisTech 2010- … Co-responsible, CE
Teaching
• Collaborative Learning Thematic Project
• Tools and applications for signals, images and sound
• Image processing and analysis
• Advanced methods for image processing
• Computer vision
• Web Mining
• Introduction to image processing (ATHENS)
• Multimedia Indexing and Retrieval (ATHENS)
• Short and long student projects (“projet libres” and “stages”)
• Image and video compression
• Video over IP
• Signal and image processing
• Wavelet and signal processing
Total: ≈1100 hours (heures équivalentes TD)
5
PhD students
Name Years Subject
Marwa Meddeb 2013 - Video-conference with HEVC
Marco Calemme 2012 - 3D Video and Depth coding
Aniello Fiengo 2012 - Rate allocation for video
Giovanni Chierchia 2011 - Convex optimization
Elie Gabriel Mora 2011 - 3D Video Compression
Giovanni Petrazzuoli 2009 - 2013 DVC and IMVS
Abdel-Bassir Abou El Ailah 2009 - 2012 DVC and FRI signals
Claudio Greco 2008 - 2012 Robust video streaming
Thomas Maugey 2007 - 2010 Multiview DVC
In addition to a dozen of MSc students supervision
6
Research projects
Name Period Subject
LABNET 2001-2002 Low-complexity video coding
CNRAED 2004-2005 Hyper-spectral image coding
CPRE 46 04 06 11 2006-2007 Region based motion vector coding
Secure Media SIM 2007-2008 Secure video coding over SIM card
AIBER 2008 Wavelet-based scalable video coding
DIVINE 2007-2009 Robust video coding
DITEMOI 2007-2010 Video streaming over wireless networks (*)
PERSEE 2009-2013 Perceptual 2D and 3D video coding (*)
SWAN 2011-2013 Network coding
SURICATE Approved Video protection
WOW Submitted Interactive 3D streaming (**)
(*) Responsible for Telecom-ParisTech
(**) Project coordinator
Moreover: smaller contributions to ACDC, Pingo, Sebastian 2, NeVEx
7
Other responsibilities
• 8 PhD Thesis committees (4 as examiner, 4 as co-supervisor)
• Area editor for 2 Elsevier journals (SPIC, SIGPRO)
• Reviewer for main journals and conferences in the field
• Participation to conference organization (Organizing committees of
MMSP’10, EUVIP’11, EUSIPCO’12, ICIP’14)
• Special session co-organization (EUSIPCO’10, DSP’11, WIAMIS’13,
ASILOMAR’13)
• Correspondant académique between Telecom-ParisTech and the
University of Naples
• Yearly Erasmus lessons at University of Naples
• Invited lesson at the Winter Doctoral School, University of Naples
(2010)
• IEEE Senior Member (‘11), IEEE SPS Member, EURASIP Member
8
Bibliometrics
• 15 journal papers: 13 published, 2 to appear
– One paper selected as “High quality paper” by the IEEE MMTC-R Letter board, and
included in the January 2013 issue
• 4 submitted journal papers: 2 in first round; 2 in preparation for the
second round
• 3 journal papers in preparation
• 59 conference papers: 56 published and 3 to appear
– Two MMSP Top 10% awards
• One standardization contribution
• One co-edited book
– F. Dufaux, B. Pesquet-Popescu, M Cagnazzo (eds.): Emerging Technologies for 3D
Video. Wiley, 2013
• 9 book chapters: 3 published and 6 to appear
• According to the Google Scholar web site, my H-index is equal to 13
(update: August 31, 2013)
9
VIDEO CODING OPTIMIZATION
1 Standardization contribution
8 Conference papers
1 Submitted journal paper
4 Journal papers
Motion vector representation
• Quantization of motion vectors to reduce their coding
cost
• Motion vector refinement and dense motion vector
representation generated at the decoder
• Lossless coding of segmented motion fields
• Motion estimation for wavelet-based video coding
11
MC
Motion vector quantization
ME
DCT IDCT
Frame
Buffer
Q
𝜆
𝒗∗
Frame
Buffer
MC𝐵
𝜃 𝑄 𝑝
𝒗∗
𝐵(𝑄 𝑝)
• M. Cagnazzo, M. Agostini, M. Antonini, G. Laroche, and J. Jung, “Motion vector quantization for efficient low-bitrate video coding,” in SPIE Visual
Communications and Image Processing Conference, vol. 7257, (San Jose, California), 2009.
• S. Corrado, M. Agostini, M. Cagnazzo, M. Antonini, G. Laroche, and J. Jung, “Improving H.264 performances by quantization of motion vectors,” in
Picture Coding Symposium, (Chicago, IL), 2009.
• M. Agostini, M. Cagnazzo, M. Antonini, G. Laroche, and J. Jung, “A new coding mode for hybrid video coders based on quantized motion vectors,” IEEE
Transactions on Circuits and Systems for Video Technology, vol. 21, pp. 946–956, July 2011. 12
DecoderEncoder
MC
Motion vector quantization
ME
DCT IDCT
Frame
Buffer
Q
𝜆
𝒗∗
Q
Frame
Buffer
MC𝐵
𝜃 𝑄 𝑝
𝒗(𝑄 𝑣)
𝑄 𝑣
𝐵(𝑄 𝑝, 𝑄 𝑣)
13
• M. Cagnazzo, M. Agostini, M. Antonini, G. Laroche, and J. Jung, “Motion vector quantization for efficient low-bitrate video coding,” in SPIE Visual
Communications and Image Processing Conference, vol. 7257, (San Jose, California), 2009.
• S. Corrado, M. Agostini, M. Cagnazzo, M. Antonini, G. Laroche, and J. Jung, “Improving H.264 performances by quantization of motion vectors,” in
Picture Coding Symposium, (Chicago, IL), 2009.
• M. Agostini, M. Cagnazzo, M. Antonini, G. Laroche, and J. Jung, “A new coding mode for hybrid video coders based on quantized motion vectors,” IEEE
Transactions on Circuits and Systems for Video Technology, vol. 21, pp. 946–956, July 2011.
DecoderEncoder
Quantization step for motion vectors
• Double-pass approach
– Estimation of the best step over a frame
– Actual encoding with the selected step
• Estimation:
– Sum of distortions
– Oracle (used as reference)
• Results: average rate reduction ≈ 4% with respect to H.264
and ≈ 8% with respect to H.264 1/8-pel
NB: All rate reductions for video are measured using the Bjontegaard
metric (approximated average rate reduction for the same PSNR over a
given interval)
14
Differential techniques for ME
BMA
ME
Hybrid
Coder H.264
Stream
Residual
MVs
Differential
MV
refinement
Input
video
Side
Info
Enhancement
Layer
Residual
Hybrid
Coder
MVs
M. Cagnazzo and B. Pesquet-Popescu, “Introducing differential motion estimation into hybrid video coders,” in SPIE Visual Communications and Image
Processing Conference, vol. 1, (Huang Shan, An Hui, China), pp. 1–4, 2010. 15
Differential ME in hybrid video coding
• Layered representation of video
• Base layer compatible with any hybrid technique
• Enhancement layer uses costless refined vectors
𝛿𝐯 𝑛, 𝑚 =
−𝑒 𝑛,𝑚
𝜆 + 𝝓 𝑛,𝑚
2 𝝓 𝑛,𝑚
• The refinement depends on the motion compensated
error image 𝑒 and on the motion compensated
reference image gradient 𝝓
• Proof of principle, small improvements (up to almost
1% rate reduction)
16
Context quantization
• Target: exploit high-order statistical dependencies in
segmented motion fields to reduce the coding rate
(lossless coding)
• Tool: context-based lossless encoder
– Implemented with an arithmetic coder
• Problem: high-order dependencies  large context 
context dilution
– I.e. too many contexts, difficult to estimate conditional
probabilities
• Solution: context quantization
• M. Cagnazzo, M. Antonini, and M. Barlaud, “Mutual information-based context quantization,” Signal Proc.: Image Comm. (Elsevier Science), pp. 64–74,
Jan. 2010. 17
Context quantization
• Contexts (i.e. sequences of already encoded symbols)
are grouped into classes
• Rate increase: the average information loss of including
a context into a class
ℒ 𝑓 = 𝑝 𝑥 𝐷 𝑝 𝑌 𝑥 ∥ 𝑝 𝑌 𝑓 𝑥
𝑥∈𝒳
𝑥: generic context
𝑌: symbol to encode
𝑓: context quantization function, i.e. context label
18
Context quantization
• Problem: finding optimum 𝑓
• Classical approach
– Start with a set of classes
– Move a context from a class 𝑐𝑖 to a class 𝑐𝑗 as far as the
relative entropy 𝐷 𝑝 𝑌 𝑥 ∥ 𝑝(𝑌|𝑐𝑖) is larger than
𝐷 𝑝 𝑌 𝑥 ∥ 𝑝(𝑌|𝑐𝑗)
– Stopping criterion on the relative improvement of the
objective function ℒ(𝑓) or on the number of iterations
19
Context quantization
• Classical approach
– Intuitive, very popular, good results
– Some open questions:
• Does the basic step actually reduce the cost function at each
iteration?
• Is it the largest possible reduction?
• If not, what is the largest possible reduction, and can we achieve it?
• Contribution: answers to these questions
20
Context quantization
• We found the expression of the cost function variation Δℒ associated to
the displacement of a context from a class to another
• We proved that with the classical approach, each iteration actually
reduces the cost function…
• … but not as much as actually possible
• We found the best step
• Rate reductions: up to 3.6% on motion data and to a further 5% on
synthetic data (global minimization based on dynamic programming)
21
ME criterion for WT-based video coding
• WT video coding is based on temporal transform rather than
classical temporal prediction
• Therefore MSE-based ME is not assured to be optimal
• The optimal criterion is the maximization of the coding gain:
CG =
𝑎𝑖 𝑤𝑖 𝜎𝑖
2𝑀
𝑖=1
𝑤𝑖 𝜎𝑖
2 𝑎 𝑖𝑀
𝑖=1
• where 𝑖 is the subband index, 𝜎𝑖
2
the variance, 𝑎𝑖 is the relative
number of coefficients, and 𝑤𝑖 the normalization factor of the 𝑖-th
subband
• M. Cagnazzo, F. Castaldo, T. André, M. Antonini, and M. Barlaud, “Optimal motion estimation for wavelet video coding,” IEEE Trans. Circuits Syst. Video
Technol., vol. 17, pp. 907–911, July 2007. 22
ME criterion for WT-based video coding
• We showed that we only have to minimize 𝜌2 = 𝜎𝑖
2 𝑎 𝑖𝑀
𝑖=1
• In general each MV influences all the subbands, the problem is still complex
• However, the CG can be analytically maximized for a particular class of MC-ed
lifting schemes, the (𝑁, 0) LS
𝒗 𝑩
∗
, 𝒗 𝑭
∗
= argmin
𝑣 𝐵,𝑣 𝐹
ℰ 𝜖 𝐵 + ℰ 𝜖 𝐹 + 2 𝜖 𝐵, 𝜖 𝐹
• Average rate reduction: 8%
23
x7x6x5 x8x3x2x1 x4
h1 h2
l1 l2
h3 h4
l3 l4
x9 … Input frames
High-frequency
subband
Low-frequency subband
(2,0) LS
NB: All rate reductions for
video are measured using
the Bjontegaard metric
3D video coding
• MVD format : multiple views plus depth
• Inter-view and inter-component redundancy
• Three contributions for the upcoming standard 3D-HEVC
24
Modification of the Merge candidate list
for 3D-VC
• In the Merge mode, a block is predicted using a vector from a
short list (Merge list)
• Coding the index list is much less costly than coding the vector
• It can be a motion vector or a disparity vector
• In 3D-HEVC, MVs are much more frequently selected than DVs
• We have proposed to insert a further DV in the Merge list
• Several positions in the primary and secondary list have been
tested
• Best results obtained with the first position of the secondary list
• We obtained both a rate reduction (0.6%) and a complexity
reduction (4%)
• Contribution accepted into the standard
• E. Mora, J. Jung, M. Cagnazzo, B. Pesquet-Popescu. "Modification of the merge candidate list for dependent views in 3D-HEVC". In IEEE International
Conference on Image Processing, September 2013. Melbourne, Australia.
• E. Mora, B. Pesquet, M. Cagnazzo and J. Jung. Modification of the Merge Candidate List for Dependent Views in 3DV-HTM. Document JCT3V-B0069 for
Shanghai meeting (MPEG number m26793). Shanghai (PRC), October 2012. 25
Intra mode inheritance for 3D-HEVC
• Observation: blocks with strong
contours and one dominant
direction tend to be encoded
with the same Intra directional
mode in Texture and Depth
• Idea: when coding Depth, add the
co-located Intra mode to the
Most Probable Mode list when a
dominant direction is detected
• Dominant direction is revealed by
the presence of a single peak in
the histogram of the gradient
angle for the current block
• E. Mora, J. Jung, M. Cagnazzo, and B. Pesquet-Popescu, “Codage de vidéos de profondeur basé sur l’héritage des modes intra de texture,” in
Compression et Représentation des Signaux Audiovisuels, vol. 1, (Lille, France), pp. 1–4, 2012.
• E. Mora, J. Jung, M. Cagnazzo, B. Pesquet-Popescu. “Depth Video Coding Based on Intra Mode Inheritance From Texture”. Submitted to APSIPA
Transactions on Signal and Information Processing (2013) 26
2. Compute
the gradient
statistics
1. Find the
reference
texture block
3. If a dominant
direction is
detected, add to
the MPM list
Intra mode inheritance for 3D-HEVC
• The Dominant angle revealed to
be an effective feature for
detecting blocks were inheritance
is effective
• Inserting the inherited mode in
the MPM list allows an average
coding rate reduction of ≈1%
• Tests performed over MPEG
sequences under Common Test
Conditions
27
Enhanced quad-tree coding for 3D-HEVC
• The 3D-HEVC codec uses quad-trees for encoding
texture and depth
• These trees are quite correlated
• We propose an inter-component coding tool for both
reducing complexity and rate by exploiting the quad-
tree redundancy
• Two variants, according to the component that is
encoded first (texture or depth)
• Contribution to 3D-HEVC working draft and
reference software
• E. Mora, J. Jung, M. Cagnazzo, B. Pesquet-Popescu. “Initialization, limitation and predictive coding of the depth and texture quad-tree in 3D-HEVC
Video Coding”. Accepted into IEEE Transaction on Circuits and Systems for Video Technology 28
Enhanced quad-tree coding for 3D-HEVC
• Observation: texture coding units are very often as much
partitioned as depth
• Therefore we can limit the depth map partitioning level if we know
texture…
• … or we can initialize the texture partitioning if we know depth
• Complexity reduction (less configuration to examine): up to -31%
encoder saving time
• Rate reduction (easier prediction of coding modes): up to -1.8%
29
Don’t Care Regions
A depth pixel only needs to be reconstructed such that the
resulting geometric error leads to an acceptable distortion in the
synthesized view
Disparity value
Error in the
synthesized
pixel value
DCR
G. Valenzise, G. Cheung, R. Galvao, M. Cagnazzo, B. Pesquet-Popescu, and A. Ortega, “Motion prediction of depth video for depth-image-based rendering
using Don’t Care Regions,” in Picture Coding Symposium, vol. 1, (Krakow, Poland), pp. 1–4, 2012. 30
DCR Example (Kendo, frame 10, t = 5)
31
Don’t Care Regions
We embedded DCR into a H.264/AVC encoder, changing
three basic aspects:
1. Motion estimation
2. Residual coding
3. Skip mode
32
Don’t Care Regions
33
Don’t Care Regions
• We compute and encode prediction residuals wrt the
DCRs
• For SKIP mode, no prediction residuals are coded
– The reconstructed values could be far outside the DCR,
leading to an arbitrarily high distortion in the synthesized
view
– We adopt a conservative policy: prevent SKIP selection when
any reconstructed pixel is outside its DCR
• Results: average rate saving of 7%
• High preprocessing complexity
34
Other work in 3D video coding
• Dense disparity field for MVV and MVD coding
• Depth coding using
elastic curve model
• I. Daribo, M. Kaaniche, W. Miled, M. Cagnazzo, and B. Pesquet-Popescu, “Dense disparity estimation in multiview video coding,” in IEEE Workshop on
Multimedia Signal Processing, (Rio de Janeiro, Brazil), 2009.
• M. Cagnazzo and B. Pesquet-Popescu, “Depth map coding by dense disparity estimation for MVD compression,” in IEEE Digital Signal Processing,
(Corfu, Greece), 2011.
• E. Mora, J. Jung, B. Pesquet-Popescu, M. Cagnazzo. "Modification of the disparity vector derivation process in 3D-HEVC". In IEEE Workshop on
Multimedia Signal Processing, vol. 1, September 2013. Cagliari, Italy. 35
OBJECT-BASED IMAGE CODING
6 Conference papers
2 Journal papers
Region-based hyperspectral image coding
Multispectral /
Hyperspectral Image
Map
Segmentation
(TS-VQ)
Map Coding
Region Coding
• M. Cagnazzo, R. Gaetano, S. Parrilli, and L. Verdoliva, “Region based compression of multispectral images by classified KLT,” in EUSIPCO. 2006.
• M. Cagnazzo, R. Gaetano, S. Parrilli, and L. Verdoliva, “Adaptive region-based compression of multispectral images,” in Proceed. of IEEE Intern. Conf.
Image Proc., (Atlanta, GA), pp. 3249–3252, Oct. 2006
• M. Cagnazzo, S. Parrilli, G. Poggi, and L. Verdoliva, “Costs and advantages of object-based image coding with shape-adaptive wavelet transform,”
EURASIP J. Image Video Proc., 2007 37
Region-based hyperspectral image coding
• Spectral transform: WT, global KLT, class-based KLT,
region-based KLT
• Spatial transform: WT, SA-WT
• Encoder: SA-SPIHT with optimal rate allocation among
objects
• Results:
– 0.5 dB better than JP2K-Multicomponent
– Better post-processing (i.e. classification) results
• M. Cagnazzo, G. Poggi, and L. Verdoliva, “Region-based transform coding of multispectral images,” IEEE Trans. on Image Processing, vol. 16, pp. 2916–2926, Dec. 2007. 38
Region-based hyperspectral image coding
AVIRIS
image
32 bands,
0.3 bps
(original
@16bps)
Landsat
TM image
6 bands,
0.6 bps
(original
@8bps)
39
Adaptive wavelet and rate allocation
• Adaptive wavelets (implemented via lifting schemes) allows to
change filters according to the signal characteristics
• Further constraint: reconstruction without sending side
information
x(k)
xd(k)= y01 (k)
U -PDSplit
d(k)
xa(k)=y00 (k)
• S. Parrilli, M. Cagnazzo, and B. Pesquet-Popescu, “Distortion evaluation in transform domain for adaptive lifting schemes,” in IEEE Workshop on
Multimedia Signal Processing, (Cairns, Australia), pp. 200–205, 2008.
• S. Parrilli, M. Cagnazzo, and B. Pesquet-Popescu, “Estimation of quantization noise for adaptive-prediction lifting schemes,” in IEEE Workshop on
Multimedia Signal Processing, (Rio de Janeiro, Brazil), 2009. 40
x(k)
xd(k)= y01 (k)
U -PDSplit
d(k)
xa(k)=y00 (k)
Adaptive wavelet and rate allocation
• The resulting transform is highly non-orthogonal
• Problem: distortion evaluation in the transform domain
in order to perform rate allocation
• Solutions for uncorrelated noise
– Good error energy evaluation
– Performance improvement for ALS up to 3dB
– Improved SSIM (+3%)
41
• M. Cagnazzo and B. Pesquet-Popescu, “Perceptual impact of transform coefficients quantization for adaptive lifting schemes,” in International
Workshop on Video Processing and Quality Metrics for Consumer Electronics, (Scottsdale, AZ), 2010.
• M. Abid, M. Cagnazzo, and B. Pesquet-Popescu, “Image denoising by adaptive lifting schemes,” in European Workshop on Visual Information
Processing, vol. 1, (Paris, France), 2010
DISTRIBUTED VIDEO CODING
17 Conference papers
2 Submitted journal paper
3 Journal papers
Distributed video coding
• Coding of many correlated sources
• Encoders do not communicate one with another
• Same RD performance of centralized coding (in theory only!)
Slepian-Wolf Coder
Quantizer
Turbo
Encoder
Min Distort
Reconstr
Q Q’
Buffer
Turbo
Decoder
WZ WZWZ SI
Image
Interpolation
KF KF
Intra
Coder
Intra
Decoder Decoded
KFs
Decoded
WZFs
Encoder Decoder
43
Image interpolation:
High-order trajectories for ME in DVC
• G. Petrazzuoli, M. Cagnazzo, B. Pesquet-Popescu. "High order motion interpolation for side information improvement in DVC". In International
Conference on Acoustics, Speech and Signal Processing, March 2010. Dallas, TX
• G. Petrazzuoli, M. Cagnazzo, and B. Pesquet-Popescu, “Fast and efficient side information generation in distributed video coding by using dense motion
representation,” in European Signal Processing Conference, (Aalborg, Denmark), 2010.
• G. Petrazzuoli, T. Maugey, M. Cagnazzo, and B. Pesquet-Popescu, “Side information refinement for long duration GOPs in DVC,” in IEEE Workshop on
Multimedia Signal Processing, vol. 1, (Saint-Malo, France), 2010. 44
Rate −3.3%
Image interpolation:
Pel-based motion estimation
• Block-based object trajectory used as initialization
• Within each block, pixel-by-pixel vectors are
obtained by refining the initialization (Cafforio-Rocca
algorithm)
• Refinement equations have been re-written and
solved () since in this case the reference image
does not exist
• Rate reductions: 3.5% to 6%
• M. Cagnazzo, T. Maugey, and B. Pesquet-Popescu, “A differential motion estimation method for image interpolation in distributed video coding,” in
International Conference on Acoustics, Speech and Signal Processing, vol. 1, (Taiwan), pp. 1861–1864, 2009.
• W. Miled, T. Maugey, M. Cagnazzo, and B. Pesquet-Popescu, “Image interpolation with dense disparity estimation in multiview distributed video
coding,” in International Conference on Distributed Smart Cameras, (Como, Italy), 2009.
• T. Maugey, W. Miled, M. Cagnazzo, and B. Pesquet-Popescu, “Méthodes denses d’interpolation de mouvement pour le codage vidéo distribué monovue
et multivue,” in Colloque GRETSI - Traitement du Signal et des Images, (Dijon (France)), 2009.
• M. Cagnazzo, W. Miled, T. Maugey, and B. Pesquet-Popescu, “Image interpolation with edge-preserving differential motion refinement,” in IEEE
International Conference on Image Processing, vol. 1, (Cairo, Egypt), pp. 361–364, 2009. 45
The Cafforio-Rocca algorithm:
Sample results
46
Local and global SI fusion
• Given the WZF, feature points on the reference frames are
extracted by SIFT
• Matching features allow to perform a global motion compensation
(first SI)
• Local motion compensation (traditional method) is also performed
(second SI)
• The two SI are merged using partial channel decoding and re-
estimating motion
• Experiments show average rate reduction of ≈ 25% with respect to
literature references
• A. Abou-El Ailah, F. Dufaux, M. Cagnazzo, B. Pesquet-Popescu, and J. Farah, “Successive refinement of side information using adaptive search area for
long duration GOPs in distributed video coding,” in International Conference on Telecommunications, (Beirut), 2012.
• A. Abou-El Ailah, F. Dufaux, M. Cagnazzo, and J. Farah, “Fusion of global and local side information using support vector machine in transform-domain
DVC,” in EUSIPCO, vol. 1, (Bucharest, Romania), pp. 1–5, Aug. 2012.
• A. Abou-El Ailah, G. Petrazzuoli, J. Farah, M. Cagnazzo, B. Pesquet-Popescu, F. Dufaux. "Side Information Improvement in Transform-Domain Distributed
Video Coding". In SPIE - Applications of Digital Image Processing,. San Diego, CA (USA), Aug. 2012
• A. Abou-El Ailah, F. Dufaux, J. Farah, M. Cagnazzo, and B. Pesquet-Popescu, “Fusion of global and local motion estimation for distributed video coding,”
IEEE Transactions on Circuits and Systems for Video Technology, vol. 23, n. 1, pp. 158-172, Jan. 2013. 47
Multiview DVC
• Motion models for temporal
image interpolations
– High order motion
interpolation
– Pixel-based motion vector
refinement
• Multi-hypothesis SI fusion
based on observed parity bits
and Bayesian classification
48
Views
Time
KFWZWZ KF
KF WZ KF WZ KF
KFWZWZ KF
KF WZ KF WZ KF
WZ
WZ
• G. Petrazzuoli, M. Cagnazzo, B. Pesquet-Popescu. “Novel solutions for side information generation and fusion in multiview distributed video coding”.
Submitted to Eurasip Journal of Advances in Signal Processing
Multiview DVC
• Step 1: produce a temporal estimation with HOMI
• Step 2: produce a inter-view estimation with occlusion
reduction (use disparity to estimate foreground
objects)
• Step 3: produce a fusion of the two estimations using
Left-Right Consistency Check to remove residual
occlusions
• Step 4: Select one out of these three images as side
information
49
Multiview DVC
• For one image out of 𝑁 we ask for parity bits for
temporal and inter-view estimation
• We compare the number of bits needed for correcting
the two estimations:
– If they are close, we
choose the fusion image
– If not, we select the
image with the least
rate
• Equivalent to Bayesian decision
𝐷 = arg max
𝑑
𝑃 𝐷 = 𝑑 𝛿 𝑅
= arg max
𝑑
𝑝 𝛿 𝑅 𝑑 𝑃 𝑑 = arg max
𝑑
𝑓𝑑(𝛿 𝑅)
50
Multiview DVC
• Experiments show that the Bayesian classifier selects
very often the best SI
• It only may be wrong when the decoding rates are very
near each to the other, but thus, selecting a suboptimal
SI does not degrade performance
• Cumulated gain w.r.t to the state of the art: ≈ 9.1%
rate reduction
51
Side information effectiveness
• Side information is corrected with parity bits to produce the
decoded WZ frame
• Intuitively, the most the SI “is similar” to the original image, the less
parity bits are needed
• Traditionally, PSNR between SI and WZF has been used to evaluate
the SI quality
• However it is easy to build some toy example where two iso-PSNR
images requires a very different number of correction bits
SI PSNR: 29.1 dB SI PSNR: 29.1 dB
Parity bits:
137kb
Decoded quality:
39.3 dB
Parity bits:
192kb
Decoded quality:
35.4 dB
• T. Maugey, J. Gauthier, M. Cagnazzo, B. Pesquet. “Evaluation of side information effectiveness in distributed video coding”. IEEE TCSVT, accepted 52
Side information effectiveness
• Questions: why PSNR is not always reliable? Can we find better metrics?
• Applications: Hash-based DVC systems, Witsenhausen-Wyner video coding
systems, …
• New framework for metric comparison based on end-to-end RD
performance
• Proposed metrics:
SIQ 𝑎 𝐼0, 𝐼1 = 10 log10
2552
𝐼0 𝒑 − 𝐼1 𝒑 𝑎
𝒑
HSIQ 𝐼0, 𝐼1 = 10 log10
𝑁bits
𝑑H 𝐼0, 𝐼1
• SIQ1 and HSIQ improves wrt PSNR both theoretical and practical
effectiveness measures (Hash-based system: 20% rate reduction)
• PSNR works well for homogenous errors and start failing for large but
spatially concentrated errors
• T. Maugey, C. Yaacoub, J. Farah, M. Cagnazzo, and B. Pesquet-Popescu, “Side information enhancement using an adaptive hash-based genetic algorithm
in a Wyner-Ziv context,” in IEEE Workshop on Multimedia Signal Processing, vol. 1, (Saint-Malo, France), 2010 53
IMVS using DVC
Views
Time
All frames are Intra
Coded
Each image is coded
and stored only once
Large bandwidth
requested
Relatively low server
space requested
54
IMVS using DVC
Views
Time
P-frames are used: all
possible frame
dependencies are
coded
Each image is coded
many times
Smallest bandwidth
requested
Very large server space
requested
55
IMVS using DVC
Views
Time
WZ-frames are used:
only parity bits are
coded
Each image is coded
and stored only once
Trade-off between
server space and
bandwidth
56
IMVS using DVC
57
Bandwidth
Server space
Only
Intra
Predictive coding:
Each image coded
many times
Ideal Case: Path known at
encoding time
WZ coding
Operation
region
IMVS for MVD using DVC
• We proposed several strategies for view-switching
• The best (adaptive) achieves a rate reduction of more
than 15% wrt to reference methods
G. Petrazzuoli, M. Cagnazzo, F. Dufaux, and B. Pesquet-Popescu, “Using distributed source coding and depth image based rendering to improve interactive
multiview video access,” in IEEE International Conference on Image Processing, vol. 1, (Bruxelles, Belgium), pp. 605–608, 2011.
G. Petrazzuoli, M. Cagnazzo, B. Pesquet-Popescu, F. Dufaux. "Enabling Immersive Visual Communications through Distributed Video Coding". IEEE MMTC
E-Letter (May 2013). 58
Other work on DVC
• Fusion schemes for multiview DVC
• Iterative methods for SI refinement
• DVC for multiple-view-plus-depth video
• DVC and interactive multiview streaming
• Local and global SI fusion
• Nine further conference papers
• A. Abou-El Ailah, F. Dufaux, J. Farah, M. Cagnazzo, and B. Pesquet-Popescu, “Fusion of global and local motion estimation for distributed video coding,”
IEEE Transactions on Circuits and Systems for Video Technology, vol. 23, n. 1, pp. 158-172, Jan. 2013.
• G. Petrazzuoli, M. Cagnazzo, B. Pesquet-Popescu, F. Dufaux. "Enabling Immersive Visual Communications through Distributed Video Coding". IEEE
MMTC E-Letter (May 2013). 59
ROBUST VIDEO DISTRIBUTION
7 Conference papers
1 Submitted journal paper + 2 in preparation
2 Journal papers
ABCD protocol
• Problem: reliable diffusion of video on wireless network
• Construction of overlays to carry MDC video
• Minimization of the number of sent packets (both video
and management packets)
• First contribution: a reliable extension of the IEEE 802.11
broadcast communication, using a control peer
• Once a reliable broadcast channel is provided, the nodes
attach to the stream as soon as they hear about it
• C. Greco, M. Cagnazzo, and B. Pesquet-Popescu, “H.264-based multiple description coding using motion compensated temporal interpolation,” in IEEE
Workshop on Multimedia Signal Processing, vol. 1, (Saint-Malo, France), 2010
• C. Greco, G. Petrazzuoli, M. Cagnazzo, and B. Pesquet-Popescu, “An MDC-based video streaming architecture for mobile networks,” in IEEE Workshop
on Multimedia Signal Processing, vol. 1, (Hangzhou, China), pp. 1–4, 2011.
• C. Greco and M. Cagnazzo, “A cross-layer protocol for cooperative content discovery over mobile ad-hoc networks,” International Journal of
Communication Networks and Distributed Systems, vol. 6, July 2011.
• C. Greco, M. Cagnazzo, and B. Pesquet-Popescu, “ABCD : Un protocole cross-layer pour la diffusion vidéo dans des réseaux sans fil ad-hoc,” in Colloque
GRETSI - Traitement du Signal et des Images, (Bordeaux, France), 2011. 61
ABCD protocol
• Once a reliable broadcast channel is provided, the nodes
attach to the stream as soon as they hear about it
𝑠 𝑝1
𝑝2
Advertisement Attachment
Attachment
62
Video Data Video Data &
Attachment
Attachment
ABCD protocol
• Once a reliable broadcast channel is provided, the nodes
attach to the stream as soon as they hear about it
𝑠 𝑝1
𝑝2
𝑝3
𝑝4
63
ABCD protocol: parent switch
𝑝∗ = arg min
𝑝
𝑤ℎℎ 𝑝 + 𝑤 𝑎 𝑎 𝑝 + 𝑤 𝑑 𝑑 𝑝 − 𝑤𝑔 𝑔(𝑝)
64
ABCD: simulation results (ns2)
65
ABCD/CoDiO
• ABCD may suffer from high delay in large, crowded networks
• To reduce the delay, we introduced a Congestion-Distortion
Optimization (CoDiO) in the per-hop wireless broadcast
transmission
• We adjust the RTS/CTS retry limit k of each packet in a Co-Di
optimized fashion
• Small values of k reduce the congestion but the distortion
increases, as the probability of obtaining the channel is lower
• High values of k lower the distortion, but congestion increases
due to the channel occupation
Cost function: 𝐽 𝑘 = 𝐷 𝑘 + 𝜆𝐶(𝑘)
• C. Greco, M. Cagnazzo, and B. Pesquet-Popescu, “Low-latency video streaming with congestion control in mobile ad-hoc networks,” IEEE Transactions
on Multimedia, vol. 14, n. 4, pp. 1337-1350, Aug. 2012. Paper selected as “High quality paper” by the IEEE MMTC-R Letter board 66
ABCD/CoDiO
Challenges:
• Model the effects of a single-node decision on the
entire network
• Even if a node switches off, alternative paths may be
formed
• Information about alternative paths is gathered at
leaves and conveyed upstream
• The information is refined where it actually matters,
i.e. near the root – where a single decision affects a lot
of nodes
67
ABCD/CoDiO: simulation results (ns2)
68
Network coding for video delivery
• Network coding allows incrementing network throughput by letting
intermediate nodes processing packets instead of simply relaying
them
• NC can easily be extended to wireless networks
69
Network coding
• Using ABCD as overlay to implement NC in wireless
network
• Optimized scheduling for MDC in Expanded Window
NC
• Optimized scheduling for multiview video over NC
• Blind source separation for reducing the NC overhead
70
Network coding for video delivery
• RDO-scheduling in NC-based delivery
• A generation is composed by the frame of a multi-view GOP
or a MDC GOP
• Each node must decide the schedule of frames
• I. Nemoianu, C. Greco, M. Cagnazzo, and B. Pesquet-Popescu, “A framework for joint multiple description coding and network coding over wireless ad-
hoc networks,” in International Conference on Acoustics, Speech and Signal Processing, (Kyoto, Japan), 2012
• I. Nemoianu, C. Greco, M. Cagnazzo, and B. Pesquet-Popescu, “A network coding scheduling for multiple description video streaming over wireless
networks,” in EUSIPCO, vol. 1, (Bucharest, Romania), pp. 1–5, Aug. 2012.
• I. Nemoianu, C. Greco, M. Cagnazzo, B. Pesquet-Popescu. "Multi-View Video Streaming over Wireless Networks with RD-Optimized Scheduling of
Network Coded Packets". In SPIE Visual Communications and Image Processing Conference, San Diego, CA (USA), Nov. 2012. 71
Network coding for video delivery
• RDO calls for a unique scheduling (send first the frame
that maximally reduces the RD cost function)
• NC calls for different scheduling at each node (pseudo-
random selection) in order to maximize the throughput
• Solution: to collect frames into groups with “similar”
RD characteristics, and randomly select within a group
72
BSS for NC
• In NC the intermediate nodes of a network send linear
combinations of the packets they have previously received, with
random coefficients taken from a finite field
• The random coefficients must be added to the packet as
headers, incurring an overhead
• In a blind source separation (BSS) based approach, it could be
possible to relieve the nodes from the need to include the
coefficients in the packets
• BSS consists in recovering a set of source signals 𝑆 from a set of
mixed signals 𝑋 = 𝑓(𝑆), also referred to as observations,
without knowing the sources themselves nor the mixing process
parameters; in NC we have linear mixing, 𝑋 = 𝐴𝑆
• I. Nemoianu, C. Greco, M. Castella, B. Pesquet-Popescu, M. Cagnazzo. "On a practical approach to source separation over finite fields for network coding
applications". In International Conference on Acoustics, Speech and Signal Processing, May 2013. Vancouver, Canada. 73
BSS for NC
• Literature BSS approach in finite fields:
– Iterative scan of packet combinations
– Minimization of a contrast function
• Our idea: add to packets a signature that is degraded by linear
combination
• Then, the contrast function can be
computed only on candidates
having a valid signature
• Problems: how to choose the
signature to reduce the probability
that a linear combination of
packets still carries a valid signature
• Simple solution: odd-parity bit
• Drastic reduction of the search space
74
CONCLUSION
Perspectives
• “Classical” video coding: advanced models for rate control
• 3D VC:
– combined use of motion and disparity compensation to produce
improved reference frames;
– elastic deformation model for lossless coding of depth contours
• DVC:
– Improved SI generation using an elastic deformation model for
estimating object shapes;
– Geometry-based DVC system for MVD (no backward channel, no
channel coding)
• NC and streaming: use of “social” information to optimize
interactive multiview streaming with a NC approach
76
New themes
• Forensic, forgery detection
• Feature representation and compression
• Video protection
• Immersive communications: holoscopy / holography,
high dynamic range
77

More Related Content

Similar to Research and activity report

Video Coding Enhancements for HTTP Adaptive Streaming
Video Coding Enhancements for HTTP Adaptive StreamingVideo Coding Enhancements for HTTP Adaptive Streaming
Video Coding Enhancements for HTTP Adaptive StreamingAlpen-Adria-Universität
 
Research@Lunch_Presentation.pdf
Research@Lunch_Presentation.pdfResearch@Lunch_Presentation.pdf
Research@Lunch_Presentation.pdfVignesh V Menon
 
QoE-enabled big video streaming for large-scale heterogeneous clients and net...
QoE-enabled big video streaming for large-scale heterogeneous clients and net...QoE-enabled big video streaming for large-scale heterogeneous clients and net...
QoE-enabled big video streaming for large-scale heterogeneous clients and net...redpel dot com
 
A Cross-Layer Framework for Multi-user360-Degree Video Streaming over Cellula...
A Cross-Layer Framework for Multi-user360-Degree Video Streaming over Cellula...A Cross-Layer Framework for Multi-user360-Degree Video Streaming over Cellula...
A Cross-Layer Framework for Multi-user360-Degree Video Streaming over Cellula...Duc Nguyen
 
QoE- and Energy-aware Content Consumption for HTTP Adaptive Streaming
QoE- and Energy-aware Content Consumption for HTTP Adaptive StreamingQoE- and Energy-aware Content Consumption for HTTP Adaptive Streaming
QoE- and Energy-aware Content Consumption for HTTP Adaptive StreamingDanieleLorenzi6
 
Multi-View Video Coding Algorithms/Techniques: A Comprehensive Study
Multi-View Video Coding Algorithms/Techniques: A Comprehensive StudyMulti-View Video Coding Algorithms/Techniques: A Comprehensive Study
Multi-View Video Coding Algorithms/Techniques: A Comprehensive StudyIJERA Editor
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersSymeon Papadopoulos
 
Wavelet based image compression technique
Wavelet based image compression techniqueWavelet based image compression technique
Wavelet based image compression techniquePriyanka Pachori
 
Video Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning
Video Coding Enhancements for HTTP Adaptive Streaming Using Machine LearningVideo Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning
Video Coding Enhancements for HTTP Adaptive Streaming Using Machine LearningEkrem Çetinkaya
 
Research Group Multimedia Communication (MMC)
Research Group Multimedia Communication (MMC)Research Group Multimedia Communication (MMC)
Research Group Multimedia Communication (MMC)Alpen-Adria-Universität
 
01_Introduction.pdf.pdf
01_Introduction.pdf.pdf01_Introduction.pdf.pdf
01_Introduction.pdf.pdfWidedMiled2
 
New coding techniques, standardisation, and quality metrics
New coding techniques, standardisation, and quality metricsNew coding techniques, standardisation, and quality metrics
New coding techniques, standardisation, and quality metricsTouradj Ebrahimi
 
The impact of jitter on the HEVC video streaming with Multiple Coding
The impact of jitter on the HEVC video streaming with  Multiple CodingThe impact of jitter on the HEVC video streaming with  Multiple Coding
The impact of jitter on the HEVC video streaming with Multiple CodingHakimSahour
 
Video Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning
Video Coding Enhancements for HTTP Adaptive Streaming Using Machine LearningVideo Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning
Video Coding Enhancements for HTTP Adaptive Streaming Using Machine LearningAlpen-Adria-Universität
 
Design and Analysis of Quantization Based Low Bit Rate Encoding System
Design and Analysis of Quantization Based Low Bit Rate Encoding SystemDesign and Analysis of Quantization Based Low Bit Rate Encoding System
Design and Analysis of Quantization Based Low Bit Rate Encoding Systemijtsrd
 
Paper id 2120148
Paper id 2120148Paper id 2120148
Paper id 2120148IJRAT
 

Similar to Research and activity report (20)

Video Coding Enhancements for HTTP Adaptive Streaming
Video Coding Enhancements for HTTP Adaptive StreamingVideo Coding Enhancements for HTTP Adaptive Streaming
Video Coding Enhancements for HTTP Adaptive Streaming
 
Research@Lunch_Presentation.pdf
Research@Lunch_Presentation.pdfResearch@Lunch_Presentation.pdf
Research@Lunch_Presentation.pdf
 
Resume-Vishnu Monn Baskaran_v3
Resume-Vishnu Monn Baskaran_v3Resume-Vishnu Monn Baskaran_v3
Resume-Vishnu Monn Baskaran_v3
 
QoE-enabled big video streaming for large-scale heterogeneous clients and net...
QoE-enabled big video streaming for large-scale heterogeneous clients and net...QoE-enabled big video streaming for large-scale heterogeneous clients and net...
QoE-enabled big video streaming for large-scale heterogeneous clients and net...
 
A0540106
A0540106A0540106
A0540106
 
A Cross-Layer Framework for Multi-user360-Degree Video Streaming over Cellula...
A Cross-Layer Framework for Multi-user360-Degree Video Streaming over Cellula...A Cross-Layer Framework for Multi-user360-Degree Video Streaming over Cellula...
A Cross-Layer Framework for Multi-user360-Degree Video Streaming over Cellula...
 
QoE- and Energy-aware Content Consumption for HTTP Adaptive Streaming
QoE- and Energy-aware Content Consumption for HTTP Adaptive StreamingQoE- and Energy-aware Content Consumption for HTTP Adaptive Streaming
QoE- and Energy-aware Content Consumption for HTTP Adaptive Streaming
 
Multi-View Video Coding Algorithms/Techniques: A Comprehensive Study
Multi-View Video Coding Algorithms/Techniques: A Comprehensive StudyMulti-View Video Coding Algorithms/Techniques: A Comprehensive Study
Multi-View Video Coding Algorithms/Techniques: A Comprehensive Study
 
A04840107
A04840107A04840107
A04840107
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
 
Wavelet based image compression technique
Wavelet based image compression techniqueWavelet based image compression technique
Wavelet based image compression technique
 
Video Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning
Video Coding Enhancements for HTTP Adaptive Streaming Using Machine LearningVideo Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning
Video Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning
 
Research Group Multimedia Communication (MMC)
Research Group Multimedia Communication (MMC)Research Group Multimedia Communication (MMC)
Research Group Multimedia Communication (MMC)
 
01_Introduction.pdf.pdf
01_Introduction.pdf.pdf01_Introduction.pdf.pdf
01_Introduction.pdf.pdf
 
New coding techniques, standardisation, and quality metrics
New coding techniques, standardisation, and quality metricsNew coding techniques, standardisation, and quality metrics
New coding techniques, standardisation, and quality metrics
 
The impact of jitter on the HEVC video streaming with Multiple Coding
The impact of jitter on the HEVC video streaming with  Multiple CodingThe impact of jitter on the HEVC video streaming with  Multiple Coding
The impact of jitter on the HEVC video streaming with Multiple Coding
 
Video Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning
Video Coding Enhancements for HTTP Adaptive Streaming Using Machine LearningVideo Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning
Video Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning
 
Design and Analysis of Quantization Based Low Bit Rate Encoding System
Design and Analysis of Quantization Based Low Bit Rate Encoding SystemDesign and Analysis of Quantization Based Low Bit Rate Encoding System
Design and Analysis of Quantization Based Low Bit Rate Encoding System
 
AVSTP2P Overview
AVSTP2P OverviewAVSTP2P Overview
AVSTP2P Overview
 
Paper id 2120148
Paper id 2120148Paper id 2120148
Paper id 2120148
 

Recently uploaded

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Recently uploaded (20)

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

Research and activity report

  • 1. Activity & research report Marco Cagnazzo Paris, September 2013
  • 2. Overview • Activity report – Teaching and PhD supervision – Projects and other activities – Bibliometrics • Research themes – Video coding optimization • Motion representation • 3D video coding – Adaptive image compression – Distributed video coding • Multiview DVC • SI effectiveness evaluation – Robust video streaming • Streaming protocols • Network coding • Conclusions 2
  • 3. Timeline • 2002-2005 : PhD @ University of Naples & University of Nice-Sophia Antipolis (cotutelle) • 2005-2006 : Post-doc @ National Multimedia Lab (Naples) & Assistant professor @ University of Naples • 2006-2008 : Post-doc @ I3S Lab, Sophia Antipolis • Since February 2008: Maître de conférences in Digital Video @ Telecom-ParisTech 3 2002 2004 2006 2008 2010 2012 2014
  • 4. Teaching 4 Name Institution Years Information Theory “Parthenope” University of Naples 2004-2005 Responsible Multimedia signal processing “Federico II” University of Naples 2005-2007 Responsible Compression techniques Telecom-ParisTech 2008- … Co-responsible Digital video and multimedia Telecom-ParisTech 2009- … Responsible Digital television Telecom-ParisTech 2009- … Co-responsible, CE Video over mobile Telecom-ParisTech 2009- … Co-responsible, CE 3D Video Telecom-ParisTech 2010- … Co-responsible, CE
  • 5. Teaching • Collaborative Learning Thematic Project • Tools and applications for signals, images and sound • Image processing and analysis • Advanced methods for image processing • Computer vision • Web Mining • Introduction to image processing (ATHENS) • Multimedia Indexing and Retrieval (ATHENS) • Short and long student projects (“projet libres” and “stages”) • Image and video compression • Video over IP • Signal and image processing • Wavelet and signal processing Total: ≈1100 hours (heures équivalentes TD) 5
  • 6. PhD students Name Years Subject Marwa Meddeb 2013 - Video-conference with HEVC Marco Calemme 2012 - 3D Video and Depth coding Aniello Fiengo 2012 - Rate allocation for video Giovanni Chierchia 2011 - Convex optimization Elie Gabriel Mora 2011 - 3D Video Compression Giovanni Petrazzuoli 2009 - 2013 DVC and IMVS Abdel-Bassir Abou El Ailah 2009 - 2012 DVC and FRI signals Claudio Greco 2008 - 2012 Robust video streaming Thomas Maugey 2007 - 2010 Multiview DVC In addition to a dozen of MSc students supervision 6
  • 7. Research projects Name Period Subject LABNET 2001-2002 Low-complexity video coding CNRAED 2004-2005 Hyper-spectral image coding CPRE 46 04 06 11 2006-2007 Region based motion vector coding Secure Media SIM 2007-2008 Secure video coding over SIM card AIBER 2008 Wavelet-based scalable video coding DIVINE 2007-2009 Robust video coding DITEMOI 2007-2010 Video streaming over wireless networks (*) PERSEE 2009-2013 Perceptual 2D and 3D video coding (*) SWAN 2011-2013 Network coding SURICATE Approved Video protection WOW Submitted Interactive 3D streaming (**) (*) Responsible for Telecom-ParisTech (**) Project coordinator Moreover: smaller contributions to ACDC, Pingo, Sebastian 2, NeVEx 7
  • 8. Other responsibilities • 8 PhD Thesis committees (4 as examiner, 4 as co-supervisor) • Area editor for 2 Elsevier journals (SPIC, SIGPRO) • Reviewer for main journals and conferences in the field • Participation to conference organization (Organizing committees of MMSP’10, EUVIP’11, EUSIPCO’12, ICIP’14) • Special session co-organization (EUSIPCO’10, DSP’11, WIAMIS’13, ASILOMAR’13) • Correspondant académique between Telecom-ParisTech and the University of Naples • Yearly Erasmus lessons at University of Naples • Invited lesson at the Winter Doctoral School, University of Naples (2010) • IEEE Senior Member (‘11), IEEE SPS Member, EURASIP Member 8
  • 9. Bibliometrics • 15 journal papers: 13 published, 2 to appear – One paper selected as “High quality paper” by the IEEE MMTC-R Letter board, and included in the January 2013 issue • 4 submitted journal papers: 2 in first round; 2 in preparation for the second round • 3 journal papers in preparation • 59 conference papers: 56 published and 3 to appear – Two MMSP Top 10% awards • One standardization contribution • One co-edited book – F. Dufaux, B. Pesquet-Popescu, M Cagnazzo (eds.): Emerging Technologies for 3D Video. Wiley, 2013 • 9 book chapters: 3 published and 6 to appear • According to the Google Scholar web site, my H-index is equal to 13 (update: August 31, 2013) 9
  • 10. VIDEO CODING OPTIMIZATION 1 Standardization contribution 8 Conference papers 1 Submitted journal paper 4 Journal papers
  • 11. Motion vector representation • Quantization of motion vectors to reduce their coding cost • Motion vector refinement and dense motion vector representation generated at the decoder • Lossless coding of segmented motion fields • Motion estimation for wavelet-based video coding 11
  • 12. MC Motion vector quantization ME DCT IDCT Frame Buffer Q 𝜆 𝒗∗ Frame Buffer MC𝐵 𝜃 𝑄 𝑝 𝒗∗ 𝐵(𝑄 𝑝) • M. Cagnazzo, M. Agostini, M. Antonini, G. Laroche, and J. Jung, “Motion vector quantization for efficient low-bitrate video coding,” in SPIE Visual Communications and Image Processing Conference, vol. 7257, (San Jose, California), 2009. • S. Corrado, M. Agostini, M. Cagnazzo, M. Antonini, G. Laroche, and J. Jung, “Improving H.264 performances by quantization of motion vectors,” in Picture Coding Symposium, (Chicago, IL), 2009. • M. Agostini, M. Cagnazzo, M. Antonini, G. Laroche, and J. Jung, “A new coding mode for hybrid video coders based on quantized motion vectors,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 21, pp. 946–956, July 2011. 12 DecoderEncoder
  • 13. MC Motion vector quantization ME DCT IDCT Frame Buffer Q 𝜆 𝒗∗ Q Frame Buffer MC𝐵 𝜃 𝑄 𝑝 𝒗(𝑄 𝑣) 𝑄 𝑣 𝐵(𝑄 𝑝, 𝑄 𝑣) 13 • M. Cagnazzo, M. Agostini, M. Antonini, G. Laroche, and J. Jung, “Motion vector quantization for efficient low-bitrate video coding,” in SPIE Visual Communications and Image Processing Conference, vol. 7257, (San Jose, California), 2009. • S. Corrado, M. Agostini, M. Cagnazzo, M. Antonini, G. Laroche, and J. Jung, “Improving H.264 performances by quantization of motion vectors,” in Picture Coding Symposium, (Chicago, IL), 2009. • M. Agostini, M. Cagnazzo, M. Antonini, G. Laroche, and J. Jung, “A new coding mode for hybrid video coders based on quantized motion vectors,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 21, pp. 946–956, July 2011. DecoderEncoder
  • 14. Quantization step for motion vectors • Double-pass approach – Estimation of the best step over a frame – Actual encoding with the selected step • Estimation: – Sum of distortions – Oracle (used as reference) • Results: average rate reduction ≈ 4% with respect to H.264 and ≈ 8% with respect to H.264 1/8-pel NB: All rate reductions for video are measured using the Bjontegaard metric (approximated average rate reduction for the same PSNR over a given interval) 14
  • 15. Differential techniques for ME BMA ME Hybrid Coder H.264 Stream Residual MVs Differential MV refinement Input video Side Info Enhancement Layer Residual Hybrid Coder MVs M. Cagnazzo and B. Pesquet-Popescu, “Introducing differential motion estimation into hybrid video coders,” in SPIE Visual Communications and Image Processing Conference, vol. 1, (Huang Shan, An Hui, China), pp. 1–4, 2010. 15
  • 16. Differential ME in hybrid video coding • Layered representation of video • Base layer compatible with any hybrid technique • Enhancement layer uses costless refined vectors 𝛿𝐯 𝑛, 𝑚 = −𝑒 𝑛,𝑚 𝜆 + 𝝓 𝑛,𝑚 2 𝝓 𝑛,𝑚 • The refinement depends on the motion compensated error image 𝑒 and on the motion compensated reference image gradient 𝝓 • Proof of principle, small improvements (up to almost 1% rate reduction) 16
  • 17. Context quantization • Target: exploit high-order statistical dependencies in segmented motion fields to reduce the coding rate (lossless coding) • Tool: context-based lossless encoder – Implemented with an arithmetic coder • Problem: high-order dependencies  large context  context dilution – I.e. too many contexts, difficult to estimate conditional probabilities • Solution: context quantization • M. Cagnazzo, M. Antonini, and M. Barlaud, “Mutual information-based context quantization,” Signal Proc.: Image Comm. (Elsevier Science), pp. 64–74, Jan. 2010. 17
  • 18. Context quantization • Contexts (i.e. sequences of already encoded symbols) are grouped into classes • Rate increase: the average information loss of including a context into a class ℒ 𝑓 = 𝑝 𝑥 𝐷 𝑝 𝑌 𝑥 ∥ 𝑝 𝑌 𝑓 𝑥 𝑥∈𝒳 𝑥: generic context 𝑌: symbol to encode 𝑓: context quantization function, i.e. context label 18
  • 19. Context quantization • Problem: finding optimum 𝑓 • Classical approach – Start with a set of classes – Move a context from a class 𝑐𝑖 to a class 𝑐𝑗 as far as the relative entropy 𝐷 𝑝 𝑌 𝑥 ∥ 𝑝(𝑌|𝑐𝑖) is larger than 𝐷 𝑝 𝑌 𝑥 ∥ 𝑝(𝑌|𝑐𝑗) – Stopping criterion on the relative improvement of the objective function ℒ(𝑓) or on the number of iterations 19
  • 20. Context quantization • Classical approach – Intuitive, very popular, good results – Some open questions: • Does the basic step actually reduce the cost function at each iteration? • Is it the largest possible reduction? • If not, what is the largest possible reduction, and can we achieve it? • Contribution: answers to these questions 20
  • 21. Context quantization • We found the expression of the cost function variation Δℒ associated to the displacement of a context from a class to another • We proved that with the classical approach, each iteration actually reduces the cost function… • … but not as much as actually possible • We found the best step • Rate reductions: up to 3.6% on motion data and to a further 5% on synthetic data (global minimization based on dynamic programming) 21
  • 22. ME criterion for WT-based video coding • WT video coding is based on temporal transform rather than classical temporal prediction • Therefore MSE-based ME is not assured to be optimal • The optimal criterion is the maximization of the coding gain: CG = 𝑎𝑖 𝑤𝑖 𝜎𝑖 2𝑀 𝑖=1 𝑤𝑖 𝜎𝑖 2 𝑎 𝑖𝑀 𝑖=1 • where 𝑖 is the subband index, 𝜎𝑖 2 the variance, 𝑎𝑖 is the relative number of coefficients, and 𝑤𝑖 the normalization factor of the 𝑖-th subband • M. Cagnazzo, F. Castaldo, T. André, M. Antonini, and M. Barlaud, “Optimal motion estimation for wavelet video coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 17, pp. 907–911, July 2007. 22
  • 23. ME criterion for WT-based video coding • We showed that we only have to minimize 𝜌2 = 𝜎𝑖 2 𝑎 𝑖𝑀 𝑖=1 • In general each MV influences all the subbands, the problem is still complex • However, the CG can be analytically maximized for a particular class of MC-ed lifting schemes, the (𝑁, 0) LS 𝒗 𝑩 ∗ , 𝒗 𝑭 ∗ = argmin 𝑣 𝐵,𝑣 𝐹 ℰ 𝜖 𝐵 + ℰ 𝜖 𝐹 + 2 𝜖 𝐵, 𝜖 𝐹 • Average rate reduction: 8% 23 x7x6x5 x8x3x2x1 x4 h1 h2 l1 l2 h3 h4 l3 l4 x9 … Input frames High-frequency subband Low-frequency subband (2,0) LS NB: All rate reductions for video are measured using the Bjontegaard metric
  • 24. 3D video coding • MVD format : multiple views plus depth • Inter-view and inter-component redundancy • Three contributions for the upcoming standard 3D-HEVC 24
  • 25. Modification of the Merge candidate list for 3D-VC • In the Merge mode, a block is predicted using a vector from a short list (Merge list) • Coding the index list is much less costly than coding the vector • It can be a motion vector or a disparity vector • In 3D-HEVC, MVs are much more frequently selected than DVs • We have proposed to insert a further DV in the Merge list • Several positions in the primary and secondary list have been tested • Best results obtained with the first position of the secondary list • We obtained both a rate reduction (0.6%) and a complexity reduction (4%) • Contribution accepted into the standard • E. Mora, J. Jung, M. Cagnazzo, B. Pesquet-Popescu. "Modification of the merge candidate list for dependent views in 3D-HEVC". In IEEE International Conference on Image Processing, September 2013. Melbourne, Australia. • E. Mora, B. Pesquet, M. Cagnazzo and J. Jung. Modification of the Merge Candidate List for Dependent Views in 3DV-HTM. Document JCT3V-B0069 for Shanghai meeting (MPEG number m26793). Shanghai (PRC), October 2012. 25
  • 26. Intra mode inheritance for 3D-HEVC • Observation: blocks with strong contours and one dominant direction tend to be encoded with the same Intra directional mode in Texture and Depth • Idea: when coding Depth, add the co-located Intra mode to the Most Probable Mode list when a dominant direction is detected • Dominant direction is revealed by the presence of a single peak in the histogram of the gradient angle for the current block • E. Mora, J. Jung, M. Cagnazzo, and B. Pesquet-Popescu, “Codage de vidéos de profondeur basé sur l’héritage des modes intra de texture,” in Compression et Représentation des Signaux Audiovisuels, vol. 1, (Lille, France), pp. 1–4, 2012. • E. Mora, J. Jung, M. Cagnazzo, B. Pesquet-Popescu. “Depth Video Coding Based on Intra Mode Inheritance From Texture”. Submitted to APSIPA Transactions on Signal and Information Processing (2013) 26 2. Compute the gradient statistics 1. Find the reference texture block 3. If a dominant direction is detected, add to the MPM list
  • 27. Intra mode inheritance for 3D-HEVC • The Dominant angle revealed to be an effective feature for detecting blocks were inheritance is effective • Inserting the inherited mode in the MPM list allows an average coding rate reduction of ≈1% • Tests performed over MPEG sequences under Common Test Conditions 27
  • 28. Enhanced quad-tree coding for 3D-HEVC • The 3D-HEVC codec uses quad-trees for encoding texture and depth • These trees are quite correlated • We propose an inter-component coding tool for both reducing complexity and rate by exploiting the quad- tree redundancy • Two variants, according to the component that is encoded first (texture or depth) • Contribution to 3D-HEVC working draft and reference software • E. Mora, J. Jung, M. Cagnazzo, B. Pesquet-Popescu. “Initialization, limitation and predictive coding of the depth and texture quad-tree in 3D-HEVC Video Coding”. Accepted into IEEE Transaction on Circuits and Systems for Video Technology 28
  • 29. Enhanced quad-tree coding for 3D-HEVC • Observation: texture coding units are very often as much partitioned as depth • Therefore we can limit the depth map partitioning level if we know texture… • … or we can initialize the texture partitioning if we know depth • Complexity reduction (less configuration to examine): up to -31% encoder saving time • Rate reduction (easier prediction of coding modes): up to -1.8% 29
  • 30. Don’t Care Regions A depth pixel only needs to be reconstructed such that the resulting geometric error leads to an acceptable distortion in the synthesized view Disparity value Error in the synthesized pixel value DCR G. Valenzise, G. Cheung, R. Galvao, M. Cagnazzo, B. Pesquet-Popescu, and A. Ortega, “Motion prediction of depth video for depth-image-based rendering using Don’t Care Regions,” in Picture Coding Symposium, vol. 1, (Krakow, Poland), pp. 1–4, 2012. 30
  • 31. DCR Example (Kendo, frame 10, t = 5) 31
  • 32. Don’t Care Regions We embedded DCR into a H.264/AVC encoder, changing three basic aspects: 1. Motion estimation 2. Residual coding 3. Skip mode 32
  • 34. Don’t Care Regions • We compute and encode prediction residuals wrt the DCRs • For SKIP mode, no prediction residuals are coded – The reconstructed values could be far outside the DCR, leading to an arbitrarily high distortion in the synthesized view – We adopt a conservative policy: prevent SKIP selection when any reconstructed pixel is outside its DCR • Results: average rate saving of 7% • High preprocessing complexity 34
  • 35. Other work in 3D video coding • Dense disparity field for MVV and MVD coding • Depth coding using elastic curve model • I. Daribo, M. Kaaniche, W. Miled, M. Cagnazzo, and B. Pesquet-Popescu, “Dense disparity estimation in multiview video coding,” in IEEE Workshop on Multimedia Signal Processing, (Rio de Janeiro, Brazil), 2009. • M. Cagnazzo and B. Pesquet-Popescu, “Depth map coding by dense disparity estimation for MVD compression,” in IEEE Digital Signal Processing, (Corfu, Greece), 2011. • E. Mora, J. Jung, B. Pesquet-Popescu, M. Cagnazzo. "Modification of the disparity vector derivation process in 3D-HEVC". In IEEE Workshop on Multimedia Signal Processing, vol. 1, September 2013. Cagliari, Italy. 35
  • 36. OBJECT-BASED IMAGE CODING 6 Conference papers 2 Journal papers
  • 37. Region-based hyperspectral image coding Multispectral / Hyperspectral Image Map Segmentation (TS-VQ) Map Coding Region Coding • M. Cagnazzo, R. Gaetano, S. Parrilli, and L. Verdoliva, “Region based compression of multispectral images by classified KLT,” in EUSIPCO. 2006. • M. Cagnazzo, R. Gaetano, S. Parrilli, and L. Verdoliva, “Adaptive region-based compression of multispectral images,” in Proceed. of IEEE Intern. Conf. Image Proc., (Atlanta, GA), pp. 3249–3252, Oct. 2006 • M. Cagnazzo, S. Parrilli, G. Poggi, and L. Verdoliva, “Costs and advantages of object-based image coding with shape-adaptive wavelet transform,” EURASIP J. Image Video Proc., 2007 37
  • 38. Region-based hyperspectral image coding • Spectral transform: WT, global KLT, class-based KLT, region-based KLT • Spatial transform: WT, SA-WT • Encoder: SA-SPIHT with optimal rate allocation among objects • Results: – 0.5 dB better than JP2K-Multicomponent – Better post-processing (i.e. classification) results • M. Cagnazzo, G. Poggi, and L. Verdoliva, “Region-based transform coding of multispectral images,” IEEE Trans. on Image Processing, vol. 16, pp. 2916–2926, Dec. 2007. 38
  • 39. Region-based hyperspectral image coding AVIRIS image 32 bands, 0.3 bps (original @16bps) Landsat TM image 6 bands, 0.6 bps (original @8bps) 39
  • 40. Adaptive wavelet and rate allocation • Adaptive wavelets (implemented via lifting schemes) allows to change filters according to the signal characteristics • Further constraint: reconstruction without sending side information x(k) xd(k)= y01 (k) U -PDSplit d(k) xa(k)=y00 (k) • S. Parrilli, M. Cagnazzo, and B. Pesquet-Popescu, “Distortion evaluation in transform domain for adaptive lifting schemes,” in IEEE Workshop on Multimedia Signal Processing, (Cairns, Australia), pp. 200–205, 2008. • S. Parrilli, M. Cagnazzo, and B. Pesquet-Popescu, “Estimation of quantization noise for adaptive-prediction lifting schemes,” in IEEE Workshop on Multimedia Signal Processing, (Rio de Janeiro, Brazil), 2009. 40 x(k) xd(k)= y01 (k) U -PDSplit d(k) xa(k)=y00 (k)
  • 41. Adaptive wavelet and rate allocation • The resulting transform is highly non-orthogonal • Problem: distortion evaluation in the transform domain in order to perform rate allocation • Solutions for uncorrelated noise – Good error energy evaluation – Performance improvement for ALS up to 3dB – Improved SSIM (+3%) 41 • M. Cagnazzo and B. Pesquet-Popescu, “Perceptual impact of transform coefficients quantization for adaptive lifting schemes,” in International Workshop on Video Processing and Quality Metrics for Consumer Electronics, (Scottsdale, AZ), 2010. • M. Abid, M. Cagnazzo, and B. Pesquet-Popescu, “Image denoising by adaptive lifting schemes,” in European Workshop on Visual Information Processing, vol. 1, (Paris, France), 2010
  • 42. DISTRIBUTED VIDEO CODING 17 Conference papers 2 Submitted journal paper 3 Journal papers
  • 43. Distributed video coding • Coding of many correlated sources • Encoders do not communicate one with another • Same RD performance of centralized coding (in theory only!) Slepian-Wolf Coder Quantizer Turbo Encoder Min Distort Reconstr Q Q’ Buffer Turbo Decoder WZ WZWZ SI Image Interpolation KF KF Intra Coder Intra Decoder Decoded KFs Decoded WZFs Encoder Decoder 43
  • 44. Image interpolation: High-order trajectories for ME in DVC • G. Petrazzuoli, M. Cagnazzo, B. Pesquet-Popescu. "High order motion interpolation for side information improvement in DVC". In International Conference on Acoustics, Speech and Signal Processing, March 2010. Dallas, TX • G. Petrazzuoli, M. Cagnazzo, and B. Pesquet-Popescu, “Fast and efficient side information generation in distributed video coding by using dense motion representation,” in European Signal Processing Conference, (Aalborg, Denmark), 2010. • G. Petrazzuoli, T. Maugey, M. Cagnazzo, and B. Pesquet-Popescu, “Side information refinement for long duration GOPs in DVC,” in IEEE Workshop on Multimedia Signal Processing, vol. 1, (Saint-Malo, France), 2010. 44 Rate −3.3%
  • 45. Image interpolation: Pel-based motion estimation • Block-based object trajectory used as initialization • Within each block, pixel-by-pixel vectors are obtained by refining the initialization (Cafforio-Rocca algorithm) • Refinement equations have been re-written and solved () since in this case the reference image does not exist • Rate reductions: 3.5% to 6% • M. Cagnazzo, T. Maugey, and B. Pesquet-Popescu, “A differential motion estimation method for image interpolation in distributed video coding,” in International Conference on Acoustics, Speech and Signal Processing, vol. 1, (Taiwan), pp. 1861–1864, 2009. • W. Miled, T. Maugey, M. Cagnazzo, and B. Pesquet-Popescu, “Image interpolation with dense disparity estimation in multiview distributed video coding,” in International Conference on Distributed Smart Cameras, (Como, Italy), 2009. • T. Maugey, W. Miled, M. Cagnazzo, and B. Pesquet-Popescu, “Méthodes denses d’interpolation de mouvement pour le codage vidéo distribué monovue et multivue,” in Colloque GRETSI - Traitement du Signal et des Images, (Dijon (France)), 2009. • M. Cagnazzo, W. Miled, T. Maugey, and B. Pesquet-Popescu, “Image interpolation with edge-preserving differential motion refinement,” in IEEE International Conference on Image Processing, vol. 1, (Cairo, Egypt), pp. 361–364, 2009. 45
  • 47. Local and global SI fusion • Given the WZF, feature points on the reference frames are extracted by SIFT • Matching features allow to perform a global motion compensation (first SI) • Local motion compensation (traditional method) is also performed (second SI) • The two SI are merged using partial channel decoding and re- estimating motion • Experiments show average rate reduction of ≈ 25% with respect to literature references • A. Abou-El Ailah, F. Dufaux, M. Cagnazzo, B. Pesquet-Popescu, and J. Farah, “Successive refinement of side information using adaptive search area for long duration GOPs in distributed video coding,” in International Conference on Telecommunications, (Beirut), 2012. • A. Abou-El Ailah, F. Dufaux, M. Cagnazzo, and J. Farah, “Fusion of global and local side information using support vector machine in transform-domain DVC,” in EUSIPCO, vol. 1, (Bucharest, Romania), pp. 1–5, Aug. 2012. • A. Abou-El Ailah, G. Petrazzuoli, J. Farah, M. Cagnazzo, B. Pesquet-Popescu, F. Dufaux. "Side Information Improvement in Transform-Domain Distributed Video Coding". In SPIE - Applications of Digital Image Processing,. San Diego, CA (USA), Aug. 2012 • A. Abou-El Ailah, F. Dufaux, J. Farah, M. Cagnazzo, and B. Pesquet-Popescu, “Fusion of global and local motion estimation for distributed video coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 23, n. 1, pp. 158-172, Jan. 2013. 47
  • 48. Multiview DVC • Motion models for temporal image interpolations – High order motion interpolation – Pixel-based motion vector refinement • Multi-hypothesis SI fusion based on observed parity bits and Bayesian classification 48 Views Time KFWZWZ KF KF WZ KF WZ KF KFWZWZ KF KF WZ KF WZ KF WZ WZ • G. Petrazzuoli, M. Cagnazzo, B. Pesquet-Popescu. “Novel solutions for side information generation and fusion in multiview distributed video coding”. Submitted to Eurasip Journal of Advances in Signal Processing
  • 49. Multiview DVC • Step 1: produce a temporal estimation with HOMI • Step 2: produce a inter-view estimation with occlusion reduction (use disparity to estimate foreground objects) • Step 3: produce a fusion of the two estimations using Left-Right Consistency Check to remove residual occlusions • Step 4: Select one out of these three images as side information 49
  • 50. Multiview DVC • For one image out of 𝑁 we ask for parity bits for temporal and inter-view estimation • We compare the number of bits needed for correcting the two estimations: – If they are close, we choose the fusion image – If not, we select the image with the least rate • Equivalent to Bayesian decision 𝐷 = arg max 𝑑 𝑃 𝐷 = 𝑑 𝛿 𝑅 = arg max 𝑑 𝑝 𝛿 𝑅 𝑑 𝑃 𝑑 = arg max 𝑑 𝑓𝑑(𝛿 𝑅) 50
  • 51. Multiview DVC • Experiments show that the Bayesian classifier selects very often the best SI • It only may be wrong when the decoding rates are very near each to the other, but thus, selecting a suboptimal SI does not degrade performance • Cumulated gain w.r.t to the state of the art: ≈ 9.1% rate reduction 51
  • 52. Side information effectiveness • Side information is corrected with parity bits to produce the decoded WZ frame • Intuitively, the most the SI “is similar” to the original image, the less parity bits are needed • Traditionally, PSNR between SI and WZF has been used to evaluate the SI quality • However it is easy to build some toy example where two iso-PSNR images requires a very different number of correction bits SI PSNR: 29.1 dB SI PSNR: 29.1 dB Parity bits: 137kb Decoded quality: 39.3 dB Parity bits: 192kb Decoded quality: 35.4 dB • T. Maugey, J. Gauthier, M. Cagnazzo, B. Pesquet. “Evaluation of side information effectiveness in distributed video coding”. IEEE TCSVT, accepted 52
  • 53. Side information effectiveness • Questions: why PSNR is not always reliable? Can we find better metrics? • Applications: Hash-based DVC systems, Witsenhausen-Wyner video coding systems, … • New framework for metric comparison based on end-to-end RD performance • Proposed metrics: SIQ 𝑎 𝐼0, 𝐼1 = 10 log10 2552 𝐼0 𝒑 − 𝐼1 𝒑 𝑎 𝒑 HSIQ 𝐼0, 𝐼1 = 10 log10 𝑁bits 𝑑H 𝐼0, 𝐼1 • SIQ1 and HSIQ improves wrt PSNR both theoretical and practical effectiveness measures (Hash-based system: 20% rate reduction) • PSNR works well for homogenous errors and start failing for large but spatially concentrated errors • T. Maugey, C. Yaacoub, J. Farah, M. Cagnazzo, and B. Pesquet-Popescu, “Side information enhancement using an adaptive hash-based genetic algorithm in a Wyner-Ziv context,” in IEEE Workshop on Multimedia Signal Processing, vol. 1, (Saint-Malo, France), 2010 53
  • 54. IMVS using DVC Views Time All frames are Intra Coded Each image is coded and stored only once Large bandwidth requested Relatively low server space requested 54
  • 55. IMVS using DVC Views Time P-frames are used: all possible frame dependencies are coded Each image is coded many times Smallest bandwidth requested Very large server space requested 55
  • 56. IMVS using DVC Views Time WZ-frames are used: only parity bits are coded Each image is coded and stored only once Trade-off between server space and bandwidth 56
  • 57. IMVS using DVC 57 Bandwidth Server space Only Intra Predictive coding: Each image coded many times Ideal Case: Path known at encoding time WZ coding Operation region
  • 58. IMVS for MVD using DVC • We proposed several strategies for view-switching • The best (adaptive) achieves a rate reduction of more than 15% wrt to reference methods G. Petrazzuoli, M. Cagnazzo, F. Dufaux, and B. Pesquet-Popescu, “Using distributed source coding and depth image based rendering to improve interactive multiview video access,” in IEEE International Conference on Image Processing, vol. 1, (Bruxelles, Belgium), pp. 605–608, 2011. G. Petrazzuoli, M. Cagnazzo, B. Pesquet-Popescu, F. Dufaux. "Enabling Immersive Visual Communications through Distributed Video Coding". IEEE MMTC E-Letter (May 2013). 58
  • 59. Other work on DVC • Fusion schemes for multiview DVC • Iterative methods for SI refinement • DVC for multiple-view-plus-depth video • DVC and interactive multiview streaming • Local and global SI fusion • Nine further conference papers • A. Abou-El Ailah, F. Dufaux, J. Farah, M. Cagnazzo, and B. Pesquet-Popescu, “Fusion of global and local motion estimation for distributed video coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 23, n. 1, pp. 158-172, Jan. 2013. • G. Petrazzuoli, M. Cagnazzo, B. Pesquet-Popescu, F. Dufaux. "Enabling Immersive Visual Communications through Distributed Video Coding". IEEE MMTC E-Letter (May 2013). 59
  • 60. ROBUST VIDEO DISTRIBUTION 7 Conference papers 1 Submitted journal paper + 2 in preparation 2 Journal papers
  • 61. ABCD protocol • Problem: reliable diffusion of video on wireless network • Construction of overlays to carry MDC video • Minimization of the number of sent packets (both video and management packets) • First contribution: a reliable extension of the IEEE 802.11 broadcast communication, using a control peer • Once a reliable broadcast channel is provided, the nodes attach to the stream as soon as they hear about it • C. Greco, M. Cagnazzo, and B. Pesquet-Popescu, “H.264-based multiple description coding using motion compensated temporal interpolation,” in IEEE Workshop on Multimedia Signal Processing, vol. 1, (Saint-Malo, France), 2010 • C. Greco, G. Petrazzuoli, M. Cagnazzo, and B. Pesquet-Popescu, “An MDC-based video streaming architecture for mobile networks,” in IEEE Workshop on Multimedia Signal Processing, vol. 1, (Hangzhou, China), pp. 1–4, 2011. • C. Greco and M. Cagnazzo, “A cross-layer protocol for cooperative content discovery over mobile ad-hoc networks,” International Journal of Communication Networks and Distributed Systems, vol. 6, July 2011. • C. Greco, M. Cagnazzo, and B. Pesquet-Popescu, “ABCD : Un protocole cross-layer pour la diffusion vidéo dans des réseaux sans fil ad-hoc,” in Colloque GRETSI - Traitement du Signal et des Images, (Bordeaux, France), 2011. 61
  • 62. ABCD protocol • Once a reliable broadcast channel is provided, the nodes attach to the stream as soon as they hear about it 𝑠 𝑝1 𝑝2 Advertisement Attachment Attachment 62
  • 63. Video Data Video Data & Attachment Attachment ABCD protocol • Once a reliable broadcast channel is provided, the nodes attach to the stream as soon as they hear about it 𝑠 𝑝1 𝑝2 𝑝3 𝑝4 63
  • 64. ABCD protocol: parent switch 𝑝∗ = arg min 𝑝 𝑤ℎℎ 𝑝 + 𝑤 𝑎 𝑎 𝑝 + 𝑤 𝑑 𝑑 𝑝 − 𝑤𝑔 𝑔(𝑝) 64
  • 66. ABCD/CoDiO • ABCD may suffer from high delay in large, crowded networks • To reduce the delay, we introduced a Congestion-Distortion Optimization (CoDiO) in the per-hop wireless broadcast transmission • We adjust the RTS/CTS retry limit k of each packet in a Co-Di optimized fashion • Small values of k reduce the congestion but the distortion increases, as the probability of obtaining the channel is lower • High values of k lower the distortion, but congestion increases due to the channel occupation Cost function: 𝐽 𝑘 = 𝐷 𝑘 + 𝜆𝐶(𝑘) • C. Greco, M. Cagnazzo, and B. Pesquet-Popescu, “Low-latency video streaming with congestion control in mobile ad-hoc networks,” IEEE Transactions on Multimedia, vol. 14, n. 4, pp. 1337-1350, Aug. 2012. Paper selected as “High quality paper” by the IEEE MMTC-R Letter board 66
  • 67. ABCD/CoDiO Challenges: • Model the effects of a single-node decision on the entire network • Even if a node switches off, alternative paths may be formed • Information about alternative paths is gathered at leaves and conveyed upstream • The information is refined where it actually matters, i.e. near the root – where a single decision affects a lot of nodes 67
  • 69. Network coding for video delivery • Network coding allows incrementing network throughput by letting intermediate nodes processing packets instead of simply relaying them • NC can easily be extended to wireless networks 69
  • 70. Network coding • Using ABCD as overlay to implement NC in wireless network • Optimized scheduling for MDC in Expanded Window NC • Optimized scheduling for multiview video over NC • Blind source separation for reducing the NC overhead 70
  • 71. Network coding for video delivery • RDO-scheduling in NC-based delivery • A generation is composed by the frame of a multi-view GOP or a MDC GOP • Each node must decide the schedule of frames • I. Nemoianu, C. Greco, M. Cagnazzo, and B. Pesquet-Popescu, “A framework for joint multiple description coding and network coding over wireless ad- hoc networks,” in International Conference on Acoustics, Speech and Signal Processing, (Kyoto, Japan), 2012 • I. Nemoianu, C. Greco, M. Cagnazzo, and B. Pesquet-Popescu, “A network coding scheduling for multiple description video streaming over wireless networks,” in EUSIPCO, vol. 1, (Bucharest, Romania), pp. 1–5, Aug. 2012. • I. Nemoianu, C. Greco, M. Cagnazzo, B. Pesquet-Popescu. "Multi-View Video Streaming over Wireless Networks with RD-Optimized Scheduling of Network Coded Packets". In SPIE Visual Communications and Image Processing Conference, San Diego, CA (USA), Nov. 2012. 71
  • 72. Network coding for video delivery • RDO calls for a unique scheduling (send first the frame that maximally reduces the RD cost function) • NC calls for different scheduling at each node (pseudo- random selection) in order to maximize the throughput • Solution: to collect frames into groups with “similar” RD characteristics, and randomly select within a group 72
  • 73. BSS for NC • In NC the intermediate nodes of a network send linear combinations of the packets they have previously received, with random coefficients taken from a finite field • The random coefficients must be added to the packet as headers, incurring an overhead • In a blind source separation (BSS) based approach, it could be possible to relieve the nodes from the need to include the coefficients in the packets • BSS consists in recovering a set of source signals 𝑆 from a set of mixed signals 𝑋 = 𝑓(𝑆), also referred to as observations, without knowing the sources themselves nor the mixing process parameters; in NC we have linear mixing, 𝑋 = 𝐴𝑆 • I. Nemoianu, C. Greco, M. Castella, B. Pesquet-Popescu, M. Cagnazzo. "On a practical approach to source separation over finite fields for network coding applications". In International Conference on Acoustics, Speech and Signal Processing, May 2013. Vancouver, Canada. 73
  • 74. BSS for NC • Literature BSS approach in finite fields: – Iterative scan of packet combinations – Minimization of a contrast function • Our idea: add to packets a signature that is degraded by linear combination • Then, the contrast function can be computed only on candidates having a valid signature • Problems: how to choose the signature to reduce the probability that a linear combination of packets still carries a valid signature • Simple solution: odd-parity bit • Drastic reduction of the search space 74
  • 76. Perspectives • “Classical” video coding: advanced models for rate control • 3D VC: – combined use of motion and disparity compensation to produce improved reference frames; – elastic deformation model for lossless coding of depth contours • DVC: – Improved SI generation using an elastic deformation model for estimating object shapes; – Geometry-based DVC system for MVD (no backward channel, no channel coding) • NC and streaming: use of “social” information to optimize interactive multiview streaming with a NC approach 76
  • 77. New themes • Forensic, forgery detection • Feature representation and compression • Video protection • Immersive communications: holoscopy / holography, high dynamic range 77