2. Outline
Introduction
Technical Overview
Polyphase Filter Bank
Psychoacoustic Model
Coding and Bit Allocation
Conclusions and Future Work
3. Introduction
What does MPEG-1 Audio provide?
A transparently lossy audio compression system based on
the weaknesses of the human ear.
Can provide compression by a factor of 6 and
retain sound quality.
One part of a three part standard that includes
audio, video, and audio/video synchronization.
5. MPEG-I Audio Features
PCM sampling rate of 32, 44.1, or 48 kHz
Four channel modes:
Monophonic and Dual-monophonic
Stereo and Joint-stereo
Three modes (layers in MPEG-I speak):
Layer I: Computationally cheapest, bit rates > 128kbps
Layer II: Bit rate ~ 128 kbps, used in VCD
Layer III: Most complicated encoding/decoding, bit rates ~
64kbps, originally intended for streaming audio
6. Human Audio System (ear + brain)
Human sensitivity to sound is non-linear
across audible range (20Hz – 20kHz)
Audible range broken into regions where
humans cannot perceive a difference
called the critical bands
8. MPEG-I Encoder Architecture
Polyphase Filter Bank: Transforms PCM samples
to frequency domain signals in 32 subbands
Psychoacoustic Model: Calculates acoustically
irrelevant parts of signal
Bit Allocator: Allots bits to subbands according to
input from psychoacoustic calculation.
Frame Creation: Generates an MPEG-I compliant
bit stream.
10. Polyphase Filter Bank
Divides audio signal into 32 equal width
subband streams in the frequency domain.
Inverse filter at decoder cannot recover
signal without some, albeit inaudible, loss.
Based on work by Rothweiler[2].
Standard specifies 512 coefficient analysis
window, C[n]
11. Polyphase Filter Bank
Buffer of 512 PCM samples with 32 new
samples, X[n], shifted in every computation cycle
Calculate window samples for i=0…511:
Partial calculation for i=0…63:
Calculate 32 subsamples:
][][][ iXiCiZ ⋅=
∑=
+=
7
0
]64[][
j
jiZiY
∑=
⋅=
63
0
]][[][][
k
kiMiYiS
13. Polyphase Filter Bank
The net effect:
Analysis matrix:
Requires 512 + 32x64 = 2560 multiplies.
Each subband has bandwidth π/32T centered at
odd multiples of π/64T
]64[]64[]][[][
63
0
7
0
jiXjiCkiMiS
k j
++= ∑ ∑= =
−+
=
64
)16)(12(
cos]][[
πki
kiM
14. Polyphase Filter Bank
Shortcomings:
Equal width filters do not correspond with critical
band model of auditory system.
Filter bank and its inverse are NOT lossless.
Frequency overlap between subbands.
18. The Weakness of the Human Ear
Frequency dependent resolution:
We do not have the ability to discern minute
differences in frequency within the critical bands.
Auditory masking:
When two signals of very close frequency are
both present, the louder will mask the softer.
A masked signal must be louder than some
threshold for it to be heard gives us room to
introduce inaudible quantization noise.
19. MPEG-I Psychoacoustic Models
MPEG-I standard defines two models:
Psychoacoustic Model 1:
Less computationally expensive
Makes some serious compromises in what it
assumes a listener cannot hear
Psychoacoustic Model 2:
Provides more features suited for Layer III
coding, assuming of course, increased processor
bandwidth.
20. Psychoacoustic Model
Convert samples to frequency domain
Use a Hann weighting and then a DFT
Simply gives an edge artifact (from finite window
size) free frequency domain representation.
Model 1 uses 512 (Layer I) or 1024 (Layers II
and III) sample window.
Model 2 uses a 1024 sample window and two
calculations per frame.
21. Psychoacoustic Model
Need to separate sound into “tones” and “noise”
components
Model 1:
Local peaks are tones, lump remaining spectrum per
critical band into noise at a representative frequency.
Model 2:
Calculate “tonality” index to determine likelihood of each
spectral point being a tone
based on previous two analysis windows
22. Psychoacoustic Model
“Smear” each signal within its critical band
Use either a masking (Model 1) or a spreading
function (Model 2).
Adjust calculated threshold by incorporating
a “quiet” mask – masking threshold for
each frequency when no other frequencies
are present.
23. Psychoacoustic Model
Calculate a masking threshold for each subband in the
polyphase filter bank
Model 1:
Selects minima of masking threshold values in range of each
subband
Inaccurate at higher frequencies – recall how subbands are
linearly distributed, critical bands are NOT!
Model 2:
If subband wider than critical band:
Use minimal masking threshold in subband
If critical band wider than subband:
Use average masking threshold in subband
24. Psychoacoustic Model
The hard work is done – now, we just
calculate the signal-to-mask ratio (SMR)
per subband
SMR = signal energy / masking threshold
We pass our result on to the coding unit
which can now produce a compressed
bitstream
33. Layer I Coding
Group 12 samples from each subband and
encode them in each frame (=384 samples)
Each group encoded with 0-15 bits/sample
Each group has 6-bit scale factor
34. Layer II Coding
Similar to Layer I except:
Groups are now 3 of 12 samples per-subband =
1152 samples per frame
Can have up to 3 scale factors per subband to
avoid audible distortion in special cases
Called scale factor selection information (SCFSI)
35. Layer III Coding
Further subdivides subbands using Modified
Discrete Cosine Transform (MDCT) – a lossless
transform
Larger frequency resolution => smaller time
resolution
possibility of pre-echo
Layer III encoder can detect and reduce pre-echo
by “borrowing bits” from future encodings
36. Bit Allocation
Determine number of bits to allot for each
subband given SMR from psychoacoustic model.
Layers I and II:
Calculate mask-to-noise ratio:
MNR = SNR – SMR (in dB)
SNR given by MPEG-I standard (as function of quantization
levels)
Now iterate until no bits to allocate left:
Allocate bits to subband with lowest MNR.
Re-calculate MNR for subband allocated more bits.
37. Bit Allocation
Layer III:
Employs “noise allocation”
Quantizes each spectral value and employs
Huffman coding
If Huffman encoding results in noise in excess of
allowed distortion for a subband, encoder
increases resolution on that subband
Whole process repeats until one of three
specified stop conditions is met.
39. Conclusions
MPEG-I provides tremendous compression
for relatively cheap computation.
Not suitable for archival or audiophile grade
music as very seasoned listeners can
discern distortion.
Modifying or searching MPEG-I content
requires decompression and is not cheap!
40. Future Work
MPEG-1 audio lays the foundation for all modern
audio compression techniques
Lots of progress since then (1994!)
MPEG-2 (1996) extends MPEG audio
compression to support 5.1 channel audio
MPEG-4 (1998) attempts to code based on
perceived audio objects in the stream
Finally, MPEG-7 (2001) operates at an even
higher level of abstraction, focusing on meta-data
coding to make content searchable and
retrievable
41. References
[1] D. Pan, “A Tutorial on MPEG/Audio Compression”,
IEEE Multimedia Journal, 1995.
[2] J. H. Rothweiler, “Polyphase Quadrature Filters – a New
Subband Coding Technique”, Proc of the Int. Conf. IEEE
ASSP, 27.2, pp1280-1283, Boston 1983.