1. Source Coding
Wireless Ad Hoc Networks
University of Tehran, Dept. of E&CE,
Fall 2007,
Farshad Lahouti
Media Basics
Contents:
Brief introduction to digital media
(Audio/Video)
Digitization
Compression
Representation
Standards
1
2. Signal Digitization
Pulse Code Modulation (PCM)
Sampling
Sampling theory – Nyquist theorem
the discrete time sequence of a sampled continuous
function { V(tn) } contains enough information to
reproduce the function V=V(t) exactly provided that the
sampling rate is at least twice that of the highest
frequency contained in the original signal V(t)
Analog signal sampled at a constant rate
telephone: 4 kHz signal BW, 8,000 samples/sec
CD music: 22 kHz signal BW, 44,100 samples/sec
2
3. Quantization
Discretization along energy axis
Every time interval the signal is converted to a digital
equivalent
Using 2 bits the following signal can be digitized
Digitization Examples
Each sample quantized, Example: 8,000 samples/sec,
i.e., rounded 256 quantized values -->
64,000 bps
e.g., 28 possible quantized
values Receiver converts it back to
analog signal:
Each quantized value some quality reduction
represented by bits Example rates
8 bits for 256 values CD: 1.411 Mbps – 16
bits/sample stereo
Internet telephony: 5.3 - 13
kbps
MP3: 96, 128, 160 kbps
3
4. Approximate size for 1 second audio
Channels Resolution Fs File Size
Mono 8bit 8Khz 64Kb
Stereo 8bit 8Khz 128Kb
Mono 16bit 8Khz 128Kb
Stereo 16bit 16Khz 256Kb
Stereo 16bit 44.1Khz 1441Kb*
Stereo 24bit 44.1Khz 2116Kb
1CD 700M 70-80 mins
Lossy and lossless Compression
Lossless compression (more later)
Data Compression
APE (MonkeyAudio)
Image compression for biomedical applications
…
Lossy compression
Hide errors where humans will not see or hear it
Study hearing and vision system to understand how
we see/hear
Perceptual Coding
4
5. Requirements for Compression Algorithms
Lossless compression
Decoded signal is mathematically equivalent to the original one
Drawback : achieves only a small or modest level of compression
Lossy compression
Decoded signal is of a lower quality than the original one
Advantage: achieves very high degree of compression
Objective: maximize the degree of compression with a certain quality
General compression requirements
Ensure a good quality of decoded signal
Achieve high compression ratios
Minimize the complexity of the encoding and decoding process
Support multiple channels
Support various data rates
Give small delay in processing
Compression Tools
Transform Coding
Variable Rate Coding
Entropy Coding
Huffman Coding
Run-length Coding
Predictive Coding
DPCM
ADPCM
5
6. Variable Length Coding
Ignores semantics of input data and compresses media
streams by regarding them as sequences of digits or
symbols
Examples: run length encoding, Huffman encoding , ...
-
Run-length encoding
A compression technique that replaces consecutive
occurrences of a symbol with the symbol followed by the
number of times it is repeated
a a a a a => 5a
000000000000000000001111111 => 0x20 1x7
Most useful where symbols appear in long runs: e.g., for
images that have areas where the pixels all have the same
value, fax and cartoons for examples.
Entropy coding
A few words about Entropy
Entropy
A measure of information content
Entropy of the English Language
How much information does each
character in “typical” English text
contain?
From a probability view
If the probability of a binary event is 0.5 (like
a coin), then, on average, you need one bit
to represent the result of this event.
As the probability of a binary event The figure is expressing that unless an
increases or decreases, the number of bits event is totally random, you can convey
you need, on average, to represent the the information of the event in fewer bits,
result decreases on average, than it might first appear
6
7. Entropy (Shannon 1948)
For a set of messages S with probability p(s), s ∈S, the
self information of s is:
1
i ( s) = log = − log p( s)
p ( s)
measured in bits if the log is base 2.
The lower the probability, the higher the self-information
Entropy is the weighted average of self information.
1
H ( S ) = ∑ p( s) log
s∈S p( s)
Entropy Example
p(S ) = {0.25, 0.25, 0.25, 0.125, 0.125}
H (S ) = 3 × 0.25 log 4 + 2 × 0.125 log 8 = 2.25
p(S ) = {0.5, 0.125, 0.125, 0.125, 0.125}
H (S ) = 0.5 log 2 + 4 × 0.125 log 8 = 2
p(S ) = {0.75, 0.0625, 0.0625, 0.0625, 0.0625}
H (S ) = 0.75 log(4 / 3) + 4 × 0.0625 log 16 = 1.3
7
8. Statistical (Entropy) Coding
Entropy Coding
• Lossless coding
• Takes advantage of the probabilistic nature of information
• Example: Huffman coding, arithmetic coding
Theorem (Shannon)
(lower bound): For any probability distribution p(S) with
associated uniquely decodable code C,
H ( S ) ≤ la (C )
Recall Huffman coding…
Huffman Coding
A popular compression technique that assigns variable length
codes to symbols, so that the most frequently occurring symbols
have the shortest codes
Huffman coding is particularly effective where the data are
dominated by a small number of symbols
Suppose to encode a source of N =8 symbols: {a,b,c,d,e,f,g,h}
The probabilities of these symbols are: P(a) = 0.01, P(b)=0.02,
P(c)=0.05, P(d)=0.09, P(e)=0.18, P(f)=0.2, P(g)=0.2, P(h)=0.25
If we assign 3 bits per symbol (N =2^3=8), the average length of the
symbols is:
The theoretical lowest average length – entropy
H(P) = - ∑ iN=0 P(i)log2P(i) = 2.57 bits /symbol
If we use Huffman encoding, the average length = 2.63 bits/symbol
8
9. Huffman Coding (Cont’d)
The Huffman code assignment procedure is based on a binary tree
structure. This tree is developed by a sequence of pairing operations
in which the two least probable symbols are joined at a node to form
two branches of a tree. More precisely:
1. The list of probabilities of the source symbols are associated
with the leaves of a binary tree.
2. Take the two smallest probabilities in the list and generate an
intermediate node as their parent and label the branch from
parent to one of the child nodes 1 and the branch from parent to
the other child 0.
3. Replace the probabilities and associated nodes in the list by the
single new intermediate node with the sum of the two probabilities.
If the list contains only one element, quit. Otherwise, go to step 2.
Huffman Coding (Cont’d)
9
10. Huffman Coding (Cont’d)
The new average length of the source is
The efficiency of this code is
How do we estimate the P(i) ? Relative frequency of the symbols
How to decode the bit stream ? Share the same Huffman table
How to decode the variable length codes ? Prefix codes have the
property that no codeword can be the prefix (i.e., an initial segment)
of any other codeword. Huffman codes are prefix codes !
11010000000010001 => ?
Does the best possible codes guarantee to always reduce the size of
sources? No. Worst case exists. Huffman coding is better averagely.
Huffman coding is particularly effective where the data are dominated
by a small number of symbols
Transform Coding
Frequency analysis ?
Time domain ? Not easy!
Time domain -> Transform domain
Sequence to be coded is converted into new sequence
using a transformation rule.
New sequence - transform coefficients.
Process is reversible - get back to original sequence
using inverse transformation.
Example - Fourier transform (FT)
Coefficients represent proportion of energy
contributed by different frequencies.
10
11. Transform Coding (Cont…)
In transform coding - choose transformation such that
only subset of coefficients have significant values.
Energy confined to subset of ‘important’ coefficients.
Known as ‘energy compaction’.
Example - FT of bandlimited signal:
Differential Coding – DPCM & ADPCM
Based on the fact that neighboring samples … x(n-1), x(n),
x(n+1), … in a discrete time sequence changes slowly in
many applications, e.g., voice, audio, …
A differential PCM coder (DPCM) quantizes and encodes the
difference d(n) = x(n) – x(n-1)
Advantage of using difference d(n) instead of the actual
value x(n)
Reduce the number of bits to represent a sample
General DPCM: d(n) = x(n) – a1x(n-1) - a2x(n-2) -…- akx(n-k)
a1, a2, …ak are fixed
Adaptive DPCM: a1, a2, …ak are dynamically changed with
signal
11
12. Psychoacoustic
Human aural response
Psychoacoustic Model
Basically: If you can’t hear the sound, don’t encode it
Natural Bandlimiting
Audio perception is 20-20 kHz but most sounds in low
frequencies (e.g., 2 kHz to 4 kHz)
Human frequency response:
Frequency masking: If a stronger sound and weaker
sound compete, you can’t hear the weaker sound. Don’t
encode it.
Temporal masking: After a loud sound, there’s a while
before we can hear a soft sound.
Stereo redundancy: At low frequencies, we can’t detect
where the sound is coming from. Encode it mono.
12
13. Perceptual Coding: Examples
MP3 = MPEG 1/2 layer 3 audio; achieves CD quality
in about 192 kbps (a 3.7:1 compression ratio): higher
compression possible
Sony MiniDisc uses Adaptive Transform Coding
(ATRAC) to achieve a 5:1 compression ratio (about
141 kbps)
http://www.mpeg.org
http://www.minidisc.org/aes_atrac.html
Artefacts of compression
Some areas of the spectrum are lost in the
encoding process
MP3 encoded recordings rarely sound identical to
original uncompressed audio files
On small or PC speakers, however, MP3 compressed
audio can be acceptable
13
15. Mp3 file (3Mb)
LPC and Parametric Coding
LPC and Parametric Coding
LPC (Linear Predictive Coding)
Based on the human utterance organ model
s(n) = a1s(n-1) + a2s(n-2) +…+ aks(n-k) + e(n)
Estimate a1, a2, …ak and e(n) for each piece (frame) of
speech
Encode and transmit/store a1, a2, …ak and type of e(n)
Decoder reproduce speech using a1, a2, …ak and e(n)
- very low bit rate but relatively low speech quality
Parametric coding:
Only coding parameters of sound generation model
LPC is an example where parameters are a1, a2, …ak , e(n)
Music instrument parameters: pitch, loudness, timbre, …
15
16. Speech Compression
Speech Compression
Handling speech with other media information such as
text, images, video, and data is the essential part of
multimedia applications
The ideal speech coder has a low bit-rate, high perceived
quality, low signal delay, and low complexity.
Delay
Less than 150 ms one way end to end delay for a
- - -
conversation
Processing (coding) delay, network delay
Over Internet, ISDN, PSTN, ATM, …
Complexity
Computational complexity of speech coders depends on
algorithms
Contributes to achievable bit rate and processing delay
-
G.72x Speech Coding Standards
G.72x Speech Coding Standards
Quality
“intelligible” - >“natural” or “subjective” quality
Depending on bit rate
-
Bit-rate
16
17. G.72x Audio Coding Standards
G.72x Audio Coding Standards
Silence Compression - detect the "silence", similar to
run-length coding
Adaptive Differential Pulse Code Modulation (ADPCM)
e.g., in CCITT G.721 -- 16 or 32 Kb/s.
(a) Encodes the difference between two or more
consecutive signals; the difference is then quantized- - >
hence the loss
(b) Adapts at quantization so fewer bits are used when
the value is smaller.
It is necessary to predict where the waveform is headed-
- >difficult
Linear Predictive Coding (LPC) fits signal to speech
model and then transmits parameters of model --> sounds
like a computer talking, 2.4 Kb/s.
Video Digitization and Compression
Video is sequence of images (frames) displayed at
constant frame rate
e.g. 24 images/sec
Digital image is a 2-D array of pixels
Each pixel represented by bits
R:G:B
Y:U:V
Y = 0.299R + 0.587G + 0.114B (Luminance or Brightness)
U = B - Y (Chrominance 1, color difference)
V = R - Y (Chrominance 2, color difference)
Redundancy
spatial
Temporal
17
18. Intra-frame coding
Transform Quantize Encode
JPEG (Joint Photographic Experts Group)
Original size
640x480x3=922KB
JPEG Compression Ratios:
30:1 to 50:1 compression is possible with small to moderate defects
100:1 compression is quite feasible for low-quality purposes
JPEG Steps
1 Block Preparation:
From RGB to YUV (YIQ) planes
8x8 blocks
2 Transform:
2-D Discrete Cosine Transform (DCT) on blocks (lossy?)
3 Quantization:
Quantize DCT Coefficients (lossy)
4 Encoding of Quantized Coefficients (lossless)
Zigzag Scan
Differential Pulse Code Modulation (DPCM) on DC component
Run Length Encoding (RLE) on AC Components
Entropy Coding: Huffman or Arithmetic
18
19. JPEG Transform Quantize Encode
Block
Preparation
Transform Quantize
Decompression:
Encode Reverse the order
(1) Block Preparation
RGB Input Data After Block Preparation
Input image: 640 x 480 RGB (24 bits/pixel) transformed to three planes:
Y: (640 x 480, 8-bit/pixel) Luminance (brightness) plane.
U, V: (320 X 240 8-bits/pixel) Chrominance (color) planes.
19
20. (2) Discrete Cosine Transform (DCT)
A transformation from spatial domain to frequency domain (similar to FFT)
Definition of 8-point DCT:
F[0,0] is the DC component and other F[u,v] define AC components of DCT
The 64 (8 x 8) DCT Basis Functions
u
DC Component
v
Block-based 2-D DCT
•Karhunen-Loeve (KL) transform ?
20
21. 8x8 DCT Example
or v
or u
DC Component
Original values of an 8x8 block Corresponding DCT coefficients
(in spatial domain) (in frequency domain)
(3) Quantized q(u,v)
DCT Coefficients
Uniform quantization:
Divide by constant N and round result.
In JPEG, each DCT F[u,v] is divided by
a constant q(u,v).
- quantization table (filter ?)
F[u,v]
Rounded
F[u,v]/ q(u,v)
21
22. (4) Zigzag Scan
Maps an 8x8 block into a 1 x 64 vector
Zigzag pattern group low frequency coefficients in top of vector.
(5) Encoding of Quantized
DCT Coefficients
DC Components:
DC component of a block is large and varied, but often
close to the DC value of the previous block.
Encode the difference of DC component from previous 8x8
blocks using Differential Pulse Code Modulation (DPCM).
AC components:
The 1x64 vector has lots of zeros in it.
Using RLE, encode as (skip, value) pairs, where skip is the
number of zeros and value is the next non-zero component.
Send (0,0) as end-of-block value.
22
23. (6) Runlength Coding
A typical 8x8 block of quantized DCT coefficients.
Most of the higher order coefficients have been quantized to 0.
12 34 0 54 0 0 0 0
87 0 0 12 3 0 0 0
16 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Zig-zag scan: the sequence of DCT coefficients to be transmitted:
12 34 87 16 0 0 54 0 0 0 0 0 0 12 0 0 3 0 0 0 .....
DC coefficient (12) is sent via a separate Huffman table.
Runlength coding remaining coefficients:
34 | 87 | 16 | 0 0 54 | 0 0 0 0 0 0 12 | 0 0 3 | 0 0 0 .....
Further compression: statistical (entropy) coding
Quantization Table Used Compressed Image
JPEG
Compression Ratio: 7.7
Example
Compression Ratio: 12.3
Original
Image
Compression Ratio: 33.9
Blocking artifact
(JPEG 2000 ?)
Compression Ratio: 60.1
23
25. Video compression: A big picture
Bi-Directional Prediction
Intra-Coded
I-Frame
Bi-directional
I B B P B B P B B P B B I Predicted
B-Frame
Group of frames (GOF)
Q: 3D Transform Coding ?
25
26. VBR vs CBR: Rate Control
Variable-Bit-Rate Rate
Controller
Fixed quantizer Qp Qp
“Constant” quality CBR
Raw Video VBR Smoothing
E.g. RMVB Encoder Buffer
Constant-Bit-Rate
Adaptive quanitzer
“Constant” rate – easier control
Difference (compared to target
rate can be 0.5% or less)
E.g. RM, MPEG-1
Rate-distortion optimization
Recall that transport layer also has
rate control …
Standardization Organizations
ITU-T VCEG (Video Coding Experts
Group)
standards for advanced moving image
coding methods appropriate for
conversational and non-conversational
audio/visual applications.
ISO/IEC MPEG (Moving Picture
Experts Group)
standards for compression and coding,
decompression, processing, and
coded representation of moving
pictures, audio, and their combination
WG - work group
Relation SG – sub group
ITU-T H.262~ISO/IEC 13818-2(mpeg2) ISO/IEC JTC 1/SC 29/WG 1
Generic Coding of Moving Pictures and Coding of Still Pictures
Associated Audio.
ISO/IEC JTC 1/SC 29/WG 11
ITU-T H.263~ISO/IEC 14496-2(mpeg4)
26
27. Coding Rate and Standards
Mobile Videophone ISDN
Video CD Digital TV HDTV
videophone over PSTN videophone
8 16 64 384 1.5 5 20
kbit/s Mbit/s
Very low bitrate Low bitrate Medium bitrate High bitrate
MPEG-4 H.263 H.261 MPEG-1 MPEG-2
ISO MPEG-1 (Moving Pictures Experts Group).
MPEG-1
Progressively scanned video for
multimedia applications, at a bit
rate 1.5Mb/s access time for
CD-ROM players.
Video format: near VHS quality
27
28. ISO MPEG-2
MPEG-2
Standard for Digital Television,
DVD
4 to 8 Mb/s / 10 to 15 Mb/s >>
MPEG -1
Supports various modes of
scalability (Spatial, temporal,
SNR)
There are differences in
quantization and better Variable
length codes tables for
progressive video sequences.
ISO MPEG-4
A much broader standard.
MPEG-4 was aimed primarily
at low bit rate video
communication, but not limited
to
Applications:
1. Digital television
2. Interactive graphics
applications
3. Interactive multimedia
(World Wide Web)
Two version: Divx 3 and Divx
4 (Internet world)
Important concept
Video object
28
29. MPEG-4 Object Video
Instead of ”frames”: Video Object Planes
Shape Adaptive DCT
A video frame
Alpha map
VOP
SA DCT
Background VOP VOP
MPEG-4 Structure
A/V
Decoder
object
Compositor
A/V
Decoder
object
Bitstream Audio/Video scene
MUX
A/V
Decoder
object
29
30. Example
Object 3
Object 1
Object 4
Object 2
Problems, comments?
Another Example
30
31. Status
Microsoft, RealVideo,
QuickTime, ...
But only recentagular frame
based
H.264 = MPEG-4 part 10
(2003)
Shape coding
Synthetic scene
H.264
H.26x (x=1,2,3)
ITU-T Recommendations
Real time video communication applications.
MPEG Standards
Video storage, broadcast video, video streaming applications
H.26 L = ITU-T + MPEG = JVT coding
Latest project of Joint Video Team formed by ITU-T SG16 Q6
( VCEG) and the ISO/IEC JTC 1/SC 29 WG 11 ( MPEG )
Basic configuration similar to H.263 and MPEG-4 Part 2
31
32. H.264 Design
Goals
Enhanced Compression performance
Provision of network friendly packet based video
representation addressing the conversational and non-
conversational applications
Conceptual Separation between Video Coding Layer ( VCL)
and Network Adaptation Layer ( NAL)
H.264 Design ( Contd. )
Video Coding Layer
Control Data
Macro-block
Data Partitioning
Slice/Partition
Network Adaptation Layer
32
33. H.264 Design ( Contd.)
Video Coding Layer
Core High compression representation
Block based motion compensated transform video coder
New features enabled to achieve significant improvement in coding
efficiency.
Network Adaptation Layer
Provides the ability to customize the format of the VCL data over a
variety of networks
Unique packet based interface
Packetisation and appropriate signaling is a part of NAL
specification
Video Coding Evolution
H.264
Y. Wang, J. Ostermann, Y.-Q. Zhang, Digital Video
Processing and Communication. Prentice Hall, 2001.
33