SlideShare une entreprise Scribd logo
1  sur  41
A Tutorial on
MPEG/Audio
Compression
Davis Pan, IEEE Multimedia Journal,
Summer 1995
Presented by:
Randeep Singh Gakhal
CMPT 820, Spring 2004
Outline
 Introduction
 Technical Overview
 Polyphase Filter Bank
 Psychoacoustic Model
 Coding and Bit Allocation
 Conclusions and Future Work
Introduction
 What does MPEG-1 Audio provide?
A transparently lossy audio compression system based on
the weaknesses of the human ear.
 Can provide compression by a factor of 6 and
retain sound quality.
 One part of a three part standard that includes
audio, video, and audio/video synchronization.
Technical Overview
MPEG-I Audio Features
 PCM sampling rate of 32, 44.1, or 48 kHz
 Four channel modes:
 Monophonic and Dual-monophonic
 Stereo and Joint-stereo
 Three modes (layers in MPEG-I speak):
 Layer I: Computationally cheapest, bit rates > 128kbps
 Layer II: Bit rate ~ 128 kbps, used in VCD
 Layer III: Most complicated encoding/decoding, bit rates ~
64kbps, originally intended for streaming audio
Human Audio System (ear + brain)
 Human sensitivity to sound is non-linear
across audible range (20Hz – 20kHz)
 Audible range broken into regions where
humans cannot perceive a difference
 called the critical bands
MPEG-I Encoder Architecture[1]
MPEG-I Encoder Architecture
 Polyphase Filter Bank: Transforms PCM samples
to frequency domain signals in 32 subbands
 Psychoacoustic Model: Calculates acoustically
irrelevant parts of signal
 Bit Allocator: Allots bits to subbands according to
input from psychoacoustic calculation.
 Frame Creation: Generates an MPEG-I compliant
bit stream.
The Polyphase
Filter Bank
Polyphase Filter Bank
 Divides audio signal into 32 equal width
subband streams in the frequency domain.
 Inverse filter at decoder cannot recover
signal without some, albeit inaudible, loss.
 Based on work by Rothweiler[2].
 Standard specifies 512 coefficient analysis
window, C[n]
Polyphase Filter Bank
 Buffer of 512 PCM samples with 32 new
samples, X[n], shifted in every computation cycle
 Calculate window samples for i=0…511:
 Partial calculation for i=0…63:
 Calculate 32 subsamples:
][][][ iXiCiZ ⋅=
∑=
+=
7
0
]64[][
j
jiZiY
∑=
⋅=
63
0
]][[][][
k
kiMiYiS
Polyphase Filter Bank
 Visualization of the filter[1]
:
Polyphase Filter Bank
 The net effect:
 Analysis matrix:
 Requires 512 + 32x64 = 2560 multiplies.
 Each subband has bandwidth π/32T centered at
odd multiples of π/64T
]64[]64[]][[][
63
0
7
0
jiXjiCkiMiS
k j
++= ∑ ∑= =





 −+
=
64
)16)(12(
cos]][[
πki
kiM
Polyphase Filter Bank
 Shortcomings:
 Equal width filters do not correspond with critical
band model of auditory system.
 Filter bank and its inverse are NOT lossless.
 Frequency overlap between subbands.
Polyphase Filter Bank
 Comparison of filter banks and critical bands[1]:
Polyphase Filter Bank
 Frequency response of one subband[1]
:
Psychoacoustic
Model
The Weakness of the Human Ear
 Frequency dependent resolution:
 We do not have the ability to discern minute
differences in frequency within the critical bands.
 Auditory masking:
 When two signals of very close frequency are
both present, the louder will mask the softer.
 A masked signal must be louder than some
threshold for it to be heard  gives us room to
introduce inaudible quantization noise.
MPEG-I Psychoacoustic Models
 MPEG-I standard defines two models:
 Psychoacoustic Model 1:
 Less computationally expensive
 Makes some serious compromises in what it
assumes a listener cannot hear
 Psychoacoustic Model 2:
 Provides more features suited for Layer III
coding, assuming of course, increased processor
bandwidth.
Psychoacoustic Model
 Convert samples to frequency domain
 Use a Hann weighting and then a DFT
 Simply gives an edge artifact (from finite window
size) free frequency domain representation.
 Model 1 uses 512 (Layer I) or 1024 (Layers II
and III) sample window.
 Model 2 uses a 1024 sample window and two
calculations per frame.
Psychoacoustic Model
 Need to separate sound into “tones” and “noise”
components
 Model 1:
 Local peaks are tones, lump remaining spectrum per
critical band into noise at a representative frequency.
 Model 2:
 Calculate “tonality” index to determine likelihood of each
spectral point being a tone
 based on previous two analysis windows
Psychoacoustic Model
 “Smear” each signal within its critical band
 Use either a masking (Model 1) or a spreading
function (Model 2).
 Adjust calculated threshold by incorporating
a “quiet” mask – masking threshold for
each frequency when no other frequencies
are present.
Psychoacoustic Model
 Calculate a masking threshold for each subband in the
polyphase filter bank
 Model 1:
 Selects minima of masking threshold values in range of each
subband
 Inaccurate at higher frequencies – recall how subbands are
linearly distributed, critical bands are NOT!
 Model 2:
 If subband wider than critical band:
 Use minimal masking threshold in subband
 If critical band wider than subband:
 Use average masking threshold in subband
Psychoacoustic Model
 The hard work is done – now, we just
calculate the signal-to-mask ratio (SMR)
per subband
 SMR = signal energy / masking threshold
 We pass our result on to the coding unit
which can now produce a compressed
bitstream
Psychoacoustic Model (example)
 Input[1]
:
Psychoacoustic Model (example)
 Transformation to perceptual domain[1]
:
Psychoacoustic Model (example)
 Calculation of masking thresholds[1]
:
Psychoacoustic Model (example)
 Signal-to-mask ratios[1]
:
Psychoacoustic Model (example)
 What we actually send[1]
:
Coding and Bit
Allocation
Layer Specific Coding
 Layer specific frame formats[1]
:
Layer Specific Coding
 Stream of samples is processed in groups[1]
:
Layer I Coding
 Group 12 samples from each subband and
encode them in each frame (=384 samples)
 Each group encoded with 0-15 bits/sample
 Each group has 6-bit scale factor
Layer II Coding
 Similar to Layer I except:
 Groups are now 3 of 12 samples per-subband =
1152 samples per frame
 Can have up to 3 scale factors per subband to
avoid audible distortion in special cases
 Called scale factor selection information (SCFSI)
Layer III Coding
 Further subdivides subbands using Modified
Discrete Cosine Transform (MDCT) – a lossless
transform
 Larger frequency resolution => smaller time
resolution
 possibility of pre-echo
 Layer III encoder can detect and reduce pre-echo
by “borrowing bits” from future encodings
Bit Allocation
 Determine number of bits to allot for each
subband given SMR from psychoacoustic model.
 Layers I and II:
 Calculate mask-to-noise ratio:
 MNR = SNR – SMR (in dB)
 SNR given by MPEG-I standard (as function of quantization
levels)
 Now iterate until no bits to allocate left:
 Allocate bits to subband with lowest MNR.
 Re-calculate MNR for subband allocated more bits.
Bit Allocation
 Layer III:
 Employs “noise allocation”
 Quantizes each spectral value and employs
Huffman coding
 If Huffman encoding results in noise in excess of
allowed distortion for a subband, encoder
increases resolution on that subband
 Whole process repeats until one of three
specified stop conditions is met.
Conclusions and
Future Work
Conclusions
 MPEG-I provides tremendous compression
for relatively cheap computation.
 Not suitable for archival or audiophile grade
music as very seasoned listeners can
discern distortion.
 Modifying or searching MPEG-I content
requires decompression and is not cheap!
Future Work
 MPEG-1 audio lays the foundation for all modern
audio compression techniques
 Lots of progress since then (1994!)
 MPEG-2 (1996) extends MPEG audio
compression to support 5.1 channel audio
 MPEG-4 (1998) attempts to code based on
perceived audio objects in the stream
 Finally, MPEG-7 (2001) operates at an even
higher level of abstraction, focusing on meta-data
coding to make content searchable and
retrievable
References
[1] D. Pan, “A Tutorial on MPEG/Audio Compression”,
IEEE Multimedia Journal, 1995.
[2] J. H. Rothweiler, “Polyphase Quadrature Filters – a New
Subband Coding Technique”, Proc of the Int. Conf. IEEE
ASSP, 27.2, pp1280-1283, Boston 1983.

Contenu connexe

Tendances

Dictionary Based Compression
Dictionary Based CompressionDictionary Based Compression
Dictionary Based Compression
anithabalaprabhu
 

Tendances (20)

Huffman Coding
Huffman CodingHuffman Coding
Huffman Coding
 
Video Compression Basics - MPEG2
Video Compression Basics - MPEG2Video Compression Basics - MPEG2
Video Compression Basics - MPEG2
 
Digital Video And Compression
Digital Video And CompressionDigital Video And Compression
Digital Video And Compression
 
Mpeg video compression
Mpeg video compressionMpeg video compression
Mpeg video compression
 
H.264 vs HEVC
H.264 vs HEVCH.264 vs HEVC
H.264 vs HEVC
 
Multimedia networking
Multimedia networkingMultimedia networking
Multimedia networking
 
Lzw coding technique for image compression
Lzw coding technique for image compressionLzw coding technique for image compression
Lzw coding technique for image compression
 
Video Compression Techniques
Video Compression TechniquesVideo Compression Techniques
Video Compression Techniques
 
Subband Coding
Subband CodingSubband Coding
Subband Coding
 
Video Compression
Video CompressionVideo Compression
Video Compression
 
Image compression .
Image compression .Image compression .
Image compression .
 
Image compression
Image compressionImage compression
Image compression
 
Video compression
Video compressionVideo compression
Video compression
 
Dictionary Based Compression
Dictionary Based CompressionDictionary Based Compression
Dictionary Based Compression
 
Data Compression (Lossy and Lossless)
Data Compression (Lossy and Lossless)Data Compression (Lossy and Lossless)
Data Compression (Lossy and Lossless)
 
Hdtv technology
Hdtv technologyHdtv technology
Hdtv technology
 
Video compression
Video compressionVideo compression
Video compression
 
Bit plane coding
Bit plane codingBit plane coding
Bit plane coding
 
Introductory Lecture to Audio Signal Processing
Introductory Lecture to Audio Signal ProcessingIntroductory Lecture to Audio Signal Processing
Introductory Lecture to Audio Signal Processing
 
Transform coding
Transform codingTransform coding
Transform coding
 

En vedette

Compression presentation 415 (1)
Compression presentation 415 (1)Compression presentation 415 (1)
Compression presentation 415 (1)
Godo Dodo
 
video compression techique
video compression techiquevideo compression techique
video compression techique
Ashish Kumar
 
Hw3 0972552
Hw3 0972552Hw3 0972552
Hw3 0972552
s0972552
 
28 h 264-avc_by_dhchang
28   h 264-avc_by_dhchang28   h 264-avc_by_dhchang
28 h 264-avc_by_dhchang
Badri Patro
 
video_compression_2004
video_compression_2004video_compression_2004
video_compression_2004
aniruddh Tyagi
 

En vedette (16)

Compression presentation 415 (1)
Compression presentation 415 (1)Compression presentation 415 (1)
Compression presentation 415 (1)
 
Chap55
Chap55Chap55
Chap55
 
Hw2
Hw2Hw2
Hw2
 
ISDD Video Compression
ISDD Video CompressionISDD Video Compression
ISDD Video Compression
 
video compression techique
video compression techiquevideo compression techique
video compression techique
 
Introduction To Video Compression
Introduction To Video CompressionIntroduction To Video Compression
Introduction To Video Compression
 
Hw3 0972552
Hw3 0972552Hw3 0972552
Hw3 0972552
 
Standards De Compression Audio Et VidéO
Standards De Compression Audio Et VidéOStandards De Compression Audio Et VidéO
Standards De Compression Audio Et VidéO
 
28 h 264-avc_by_dhchang
28   h 264-avc_by_dhchang28   h 264-avc_by_dhchang
28 h 264-avc_by_dhchang
 
video_compression_2004
video_compression_2004video_compression_2004
video_compression_2004
 
MPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video EncodingMPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video Encoding
 
MPEG Compression Standards
MPEG Compression StandardsMPEG Compression Standards
MPEG Compression Standards
 
Iain Richardson: An Introduction to Video Compression
Iain Richardson: An Introduction to Video CompressionIain Richardson: An Introduction to Video Compression
Iain Richardson: An Introduction to Video Compression
 
Compression: Video Compression (MPEG and others)
Compression: Video Compression (MPEG and others)Compression: Video Compression (MPEG and others)
Compression: Video Compression (MPEG and others)
 
Video Compression Basics
Video Compression BasicsVideo Compression Basics
Video Compression Basics
 
Compression
CompressionCompression
Compression
 

Similaire à MPEG/Audio Compression

Audio and video compression
Audio and video compressionAudio and video compression
Audio and video compression
neeraj9217
 
Lecture 8 audio compression
Lecture 8 audio compressionLecture 8 audio compression
Lecture 8 audio compression
Mr SMAK
 
Compression of digital voice and video
Compression of digital voice and videoCompression of digital voice and video
Compression of digital voice and video
sangusajjan
 

Similaire à MPEG/Audio Compression (20)

Final presentation
Final presentationFinal presentation
Final presentation
 
Audio Compression_2023.pptx
Audio Compression_2023.pptxAudio Compression_2023.pptx
Audio Compression_2023.pptx
 
Multimedia.pdf
Multimedia.pdfMultimedia.pdf
Multimedia.pdf
 
A1mpeg12 2004
A1mpeg12 2004A1mpeg12 2004
A1mpeg12 2004
 
add9.5.ppt
add9.5.pptadd9.5.ppt
add9.5.ppt
 
Audio and video compression
Audio and video compressionAudio and video compression
Audio and video compression
 
Multimedia Object - Audio
Multimedia Object - AudioMultimedia Object - Audio
Multimedia Object - Audio
 
Novel Approach of Implementing Psychoacoustic model for MPEG-1 Audio
Novel Approach of Implementing Psychoacoustic model for MPEG-1 AudioNovel Approach of Implementing Psychoacoustic model for MPEG-1 Audio
Novel Approach of Implementing Psychoacoustic model for MPEG-1 Audio
 
Audio compression 1
Audio compression 1Audio compression 1
Audio compression 1
 
lect10-mpeg1.ppt
lect10-mpeg1.pptlect10-mpeg1.ppt
lect10-mpeg1.ppt
 
Psychoacoustic Approaches to Audio Steganography
Psychoacoustic Approaches to Audio SteganographyPsychoacoustic Approaches to Audio Steganography
Psychoacoustic Approaches to Audio Steganography
 
Lecture 8 audio compression
Lecture 8 audio compressionLecture 8 audio compression
Lecture 8 audio compression
 
audiocompression-130624061221-phpapp02.pptx
audiocompression-130624061221-phpapp02.pptxaudiocompression-130624061221-phpapp02.pptx
audiocompression-130624061221-phpapp02.pptx
 
Speaker Segmentation (2006)
Speaker Segmentation (2006)Speaker Segmentation (2006)
Speaker Segmentation (2006)
 
Soundpres
SoundpresSoundpres
Soundpres
 
Chapter 2- Digital Data Acquistion.ppt
Chapter 2- Digital Data Acquistion.pptChapter 2- Digital Data Acquistion.ppt
Chapter 2- Digital Data Acquistion.ppt
 
C06-Broadcast_Systems1.ppt
C06-Broadcast_Systems1.pptC06-Broadcast_Systems1.ppt
C06-Broadcast_Systems1.ppt
 
Compression of digital voice and video
Compression of digital voice and videoCompression of digital voice and video
Compression of digital voice and video
 
Mixer v1.0.3
Mixer v1.0.3Mixer v1.0.3
Mixer v1.0.3
 
Speech Compression
Speech CompressionSpeech Compression
Speech Compression
 

Plus de Daniel Brewster (20)

Evaluation question 2
Evaluation question 2Evaluation question 2
Evaluation question 2
 
Meeting minutes 21
Meeting minutes 21Meeting minutes 21
Meeting minutes 21
 
Meeting minutes 21
Meeting minutes 21Meeting minutes 21
Meeting minutes 21
 
Meeting minutes 23
Meeting minutes 23Meeting minutes 23
Meeting minutes 23
 
Meeting minutes 20
Meeting minutes 20Meeting minutes 20
Meeting minutes 20
 
Meeting minutes 23
Meeting minutes 23Meeting minutes 23
Meeting minutes 23
 
Meeting minutes 22
Meeting minutes 22Meeting minutes 22
Meeting minutes 22
 
Meeting minutes 21
Meeting minutes 21Meeting minutes 21
Meeting minutes 21
 
Meeting m=
Meeting m=Meeting m=
Meeting m=
 
Meeting minutes 19
Meeting minutes 19Meeting minutes 19
Meeting minutes 19
 
Meeting minutes 18
Meeting minutes 18Meeting minutes 18
Meeting minutes 18
 
Meeting minutes 17
Meeting minutes 17Meeting minutes 17
Meeting minutes 17
 
Meeting minutes 16
Meeting minutes 16Meeting minutes 16
Meeting minutes 16
 
Meeting minutes 15
Meeting minutes 15Meeting minutes 15
Meeting minutes 15
 
Meeting minutes 14
Meeting minutes 14Meeting minutes 14
Meeting minutes 14
 
Meeting minutes 13
Meeting minutes 13Meeting minutes 13
Meeting minutes 13
 
Short film analysis 2 losers
Short film analysis 2   losersShort film analysis 2   losers
Short film analysis 2 losers
 
Short film analysis - Tick Tock
Short film analysis - Tick TockShort film analysis - Tick Tock
Short film analysis - Tick Tock
 
Representation
RepresentationRepresentation
Representation
 
Representation
RepresentationRepresentation
Representation
 

Dernier

❤ Sexy Call Girls in Chandigarh 👀📞 90,539,00,678📞 Chandigarh Call Girls Servi...
❤ Sexy Call Girls in Chandigarh 👀📞 90,539,00,678📞 Chandigarh Call Girls Servi...❤ Sexy Call Girls in Chandigarh 👀📞 90,539,00,678📞 Chandigarh Call Girls Servi...
❤ Sexy Call Girls in Chandigarh 👀📞 90,539,00,678📞 Chandigarh Call Girls Servi...
Chandigarh Call girls 9053900678 Call girls in Chandigarh
 
Call Girl In Chandigarh ☎ 08868886958✅ Just Genuine Call Call Girls Chandigar...
Call Girl In Chandigarh ☎ 08868886958✅ Just Genuine Call Call Girls Chandigar...Call Girl In Chandigarh ☎ 08868886958✅ Just Genuine Call Call Girls Chandigar...
Call Girl In Chandigarh ☎ 08868886958✅ Just Genuine Call Call Girls Chandigar...
Sheetaleventcompany
 
FULL NIGHT — 9999894380 Call Girls In Badarpur | Delhi
FULL NIGHT — 9999894380 Call Girls In Badarpur | DelhiFULL NIGHT — 9999894380 Call Girls In Badarpur | Delhi
FULL NIGHT — 9999894380 Call Girls In Badarpur | Delhi
SaketCallGirlsCallUs
 
❤️Call girls in Chandigarh ☎️8264406502☎️ Call Girl service in Chandigarh☎️ C...
❤️Call girls in Chandigarh ☎️8264406502☎️ Call Girl service in Chandigarh☎️ C...❤️Call girls in Chandigarh ☎️8264406502☎️ Call Girl service in Chandigarh☎️ C...
❤️Call girls in Chandigarh ☎️8264406502☎️ Call Girl service in Chandigarh☎️ C...
Sheetaleventcompany
 
FULL NIGHT — 9999894380 Call Girls In Saket | Delhi
FULL NIGHT — 9999894380 Call Girls In Saket | DelhiFULL NIGHT — 9999894380 Call Girls In Saket | Delhi
FULL NIGHT — 9999894380 Call Girls In Saket | Delhi
SaketCallGirlsCallUs
 
Powerful Love Spells in New York City, NY (310) 882-6330 Bring Back Lost Lover
Powerful Love Spells in New York City, NY (310) 882-6330 Bring Back Lost LoverPowerful Love Spells in New York City, NY (310) 882-6330 Bring Back Lost Lover
Powerful Love Spells in New York City, NY (310) 882-6330 Bring Back Lost Lover
PsychicRuben LoveSpells
 
architect Hassan Khalil portfolio Year 2024
architect Hassan Khalil portfolio  Year 2024architect Hassan Khalil portfolio  Year 2024
architect Hassan Khalil portfolio Year 2024
hassan khalil
 
FULL NIGHT — 9999894380 Call Girls In Najafgarh | Delhi
FULL NIGHT — 9999894380 Call Girls In Najafgarh | DelhiFULL NIGHT — 9999894380 Call Girls In Najafgarh | Delhi
FULL NIGHT — 9999894380 Call Girls In Najafgarh | Delhi
SaketCallGirlsCallUs
 
FULL NIGHT — 9999894380 Call Girls In Wazirabad | Delhi
FULL NIGHT — 9999894380 Call Girls In Wazirabad | DelhiFULL NIGHT — 9999894380 Call Girls In Wazirabad | Delhi
FULL NIGHT — 9999894380 Call Girls In Wazirabad | Delhi
SaketCallGirlsCallUs
 
(9711106444 )🫦#Sexy Desi Call Girls Noida Sector 4 Escorts Service Delhi 🫶
(9711106444 )🫦#Sexy Desi Call Girls Noida Sector 4 Escorts Service Delhi 🫶(9711106444 )🫦#Sexy Desi Call Girls Noida Sector 4 Escorts Service Delhi 🫶
(9711106444 )🫦#Sexy Desi Call Girls Noida Sector 4 Escorts Service Delhi 🫶
delhimunirka444
 
Museum of fine arts Lauren Simpson…………..
Museum of fine arts Lauren Simpson…………..Museum of fine arts Lauren Simpson…………..
Museum of fine arts Lauren Simpson…………..
mvxpw22gfc
 
Van Gogh Powerpoint for art lesson today
Van Gogh Powerpoint for art lesson todayVan Gogh Powerpoint for art lesson today
Van Gogh Powerpoint for art lesson today
lucygibson17
 

Dernier (20)

Mayiladuthurai Call Girls 8617697112 Short 3000 Night 8000 Best call girls Se...
Mayiladuthurai Call Girls 8617697112 Short 3000 Night 8000 Best call girls Se...Mayiladuthurai Call Girls 8617697112 Short 3000 Night 8000 Best call girls Se...
Mayiladuthurai Call Girls 8617697112 Short 3000 Night 8000 Best call girls Se...
 
❤ Sexy Call Girls in Chandigarh 👀📞 90,539,00,678📞 Chandigarh Call Girls Servi...
❤ Sexy Call Girls in Chandigarh 👀📞 90,539,00,678📞 Chandigarh Call Girls Servi...❤ Sexy Call Girls in Chandigarh 👀📞 90,539,00,678📞 Chandigarh Call Girls Servi...
❤ Sexy Call Girls in Chandigarh 👀📞 90,539,00,678📞 Chandigarh Call Girls Servi...
 
Call Girl In Chandigarh ☎ 08868886958✅ Just Genuine Call Call Girls Chandigar...
Call Girl In Chandigarh ☎ 08868886958✅ Just Genuine Call Call Girls Chandigar...Call Girl In Chandigarh ☎ 08868886958✅ Just Genuine Call Call Girls Chandigar...
Call Girl In Chandigarh ☎ 08868886958✅ Just Genuine Call Call Girls Chandigar...
 
FULL NIGHT — 9999894380 Call Girls In Badarpur | Delhi
FULL NIGHT — 9999894380 Call Girls In Badarpur | DelhiFULL NIGHT — 9999894380 Call Girls In Badarpur | Delhi
FULL NIGHT — 9999894380 Call Girls In Badarpur | Delhi
 
Sirmaur Call Girls Book Now 8617697112 Top Class Pondicherry Escort Service A...
Sirmaur Call Girls Book Now 8617697112 Top Class Pondicherry Escort Service A...Sirmaur Call Girls Book Now 8617697112 Top Class Pondicherry Escort Service A...
Sirmaur Call Girls Book Now 8617697112 Top Class Pondicherry Escort Service A...
 
Hire 💕 8617370543 Mumbai Suburban Call Girls Service Call Girls Agency
Hire 💕 8617370543 Mumbai Suburban Call Girls Service Call Girls AgencyHire 💕 8617370543 Mumbai Suburban Call Girls Service Call Girls Agency
Hire 💕 8617370543 Mumbai Suburban Call Girls Service Call Girls Agency
 
❤️Call girls in Chandigarh ☎️8264406502☎️ Call Girl service in Chandigarh☎️ C...
❤️Call girls in Chandigarh ☎️8264406502☎️ Call Girl service in Chandigarh☎️ C...❤️Call girls in Chandigarh ☎️8264406502☎️ Call Girl service in Chandigarh☎️ C...
❤️Call girls in Chandigarh ☎️8264406502☎️ Call Girl service in Chandigarh☎️ C...
 
FULL NIGHT — 9999894380 Call Girls In Saket | Delhi
FULL NIGHT — 9999894380 Call Girls In Saket | DelhiFULL NIGHT — 9999894380 Call Girls In Saket | Delhi
FULL NIGHT — 9999894380 Call Girls In Saket | Delhi
 
Powerful Love Spells in New York City, NY (310) 882-6330 Bring Back Lost Lover
Powerful Love Spells in New York City, NY (310) 882-6330 Bring Back Lost LoverPowerful Love Spells in New York City, NY (310) 882-6330 Bring Back Lost Lover
Powerful Love Spells in New York City, NY (310) 882-6330 Bring Back Lost Lover
 
architect Hassan Khalil portfolio Year 2024
architect Hassan Khalil portfolio  Year 2024architect Hassan Khalil portfolio  Year 2024
architect Hassan Khalil portfolio Year 2024
 
8377087607, Door Step Call Girls In Kalkaji (Locanto) 24/7 Available
8377087607, Door Step Call Girls In Kalkaji (Locanto) 24/7 Available8377087607, Door Step Call Girls In Kalkaji (Locanto) 24/7 Available
8377087607, Door Step Call Girls In Kalkaji (Locanto) 24/7 Available
 
FULL NIGHT — 9999894380 Call Girls In Najafgarh | Delhi
FULL NIGHT — 9999894380 Call Girls In Najafgarh | DelhiFULL NIGHT — 9999894380 Call Girls In Najafgarh | Delhi
FULL NIGHT — 9999894380 Call Girls In Najafgarh | Delhi
 
Moradabad Call Girls - 📞 8617697112 🔝 Top Class Call Girls Service Available
Moradabad Call Girls - 📞 8617697112 🔝 Top Class Call Girls Service AvailableMoradabad Call Girls - 📞 8617697112 🔝 Top Class Call Girls Service Available
Moradabad Call Girls - 📞 8617697112 🔝 Top Class Call Girls Service Available
 
FULL NIGHT — 9999894380 Call Girls In Wazirabad | Delhi
FULL NIGHT — 9999894380 Call Girls In Wazirabad | DelhiFULL NIGHT — 9999894380 Call Girls In Wazirabad | Delhi
FULL NIGHT — 9999894380 Call Girls In Wazirabad | Delhi
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
 
(INDIRA) Call Girl Dehradun Call Now 8617697112 Dehradun Escorts 24x7
(INDIRA) Call Girl Dehradun Call Now 8617697112 Dehradun Escorts 24x7(INDIRA) Call Girl Dehradun Call Now 8617697112 Dehradun Escorts 24x7
(INDIRA) Call Girl Dehradun Call Now 8617697112 Dehradun Escorts 24x7
 
(9711106444 )🫦#Sexy Desi Call Girls Noida Sector 4 Escorts Service Delhi 🫶
(9711106444 )🫦#Sexy Desi Call Girls Noida Sector 4 Escorts Service Delhi 🫶(9711106444 )🫦#Sexy Desi Call Girls Noida Sector 4 Escorts Service Delhi 🫶
(9711106444 )🫦#Sexy Desi Call Girls Noida Sector 4 Escorts Service Delhi 🫶
 
AaliyahBell_themist_v01.pdf .
AaliyahBell_themist_v01.pdf             .AaliyahBell_themist_v01.pdf             .
AaliyahBell_themist_v01.pdf .
 
Museum of fine arts Lauren Simpson…………..
Museum of fine arts Lauren Simpson…………..Museum of fine arts Lauren Simpson…………..
Museum of fine arts Lauren Simpson…………..
 
Van Gogh Powerpoint for art lesson today
Van Gogh Powerpoint for art lesson todayVan Gogh Powerpoint for art lesson today
Van Gogh Powerpoint for art lesson today
 

MPEG/Audio Compression

  • 1. A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004
  • 2. Outline  Introduction  Technical Overview  Polyphase Filter Bank  Psychoacoustic Model  Coding and Bit Allocation  Conclusions and Future Work
  • 3. Introduction  What does MPEG-1 Audio provide? A transparently lossy audio compression system based on the weaknesses of the human ear.  Can provide compression by a factor of 6 and retain sound quality.  One part of a three part standard that includes audio, video, and audio/video synchronization.
  • 5. MPEG-I Audio Features  PCM sampling rate of 32, 44.1, or 48 kHz  Four channel modes:  Monophonic and Dual-monophonic  Stereo and Joint-stereo  Three modes (layers in MPEG-I speak):  Layer I: Computationally cheapest, bit rates > 128kbps  Layer II: Bit rate ~ 128 kbps, used in VCD  Layer III: Most complicated encoding/decoding, bit rates ~ 64kbps, originally intended for streaming audio
  • 6. Human Audio System (ear + brain)  Human sensitivity to sound is non-linear across audible range (20Hz – 20kHz)  Audible range broken into regions where humans cannot perceive a difference  called the critical bands
  • 8. MPEG-I Encoder Architecture  Polyphase Filter Bank: Transforms PCM samples to frequency domain signals in 32 subbands  Psychoacoustic Model: Calculates acoustically irrelevant parts of signal  Bit Allocator: Allots bits to subbands according to input from psychoacoustic calculation.  Frame Creation: Generates an MPEG-I compliant bit stream.
  • 10. Polyphase Filter Bank  Divides audio signal into 32 equal width subband streams in the frequency domain.  Inverse filter at decoder cannot recover signal without some, albeit inaudible, loss.  Based on work by Rothweiler[2].  Standard specifies 512 coefficient analysis window, C[n]
  • 11. Polyphase Filter Bank  Buffer of 512 PCM samples with 32 new samples, X[n], shifted in every computation cycle  Calculate window samples for i=0…511:  Partial calculation for i=0…63:  Calculate 32 subsamples: ][][][ iXiCiZ ⋅= ∑= += 7 0 ]64[][ j jiZiY ∑= ⋅= 63 0 ]][[][][ k kiMiYiS
  • 12. Polyphase Filter Bank  Visualization of the filter[1] :
  • 13. Polyphase Filter Bank  The net effect:  Analysis matrix:  Requires 512 + 32x64 = 2560 multiplies.  Each subband has bandwidth π/32T centered at odd multiples of π/64T ]64[]64[]][[][ 63 0 7 0 jiXjiCkiMiS k j ++= ∑ ∑= =       −+ = 64 )16)(12( cos]][[ πki kiM
  • 14. Polyphase Filter Bank  Shortcomings:  Equal width filters do not correspond with critical band model of auditory system.  Filter bank and its inverse are NOT lossless.  Frequency overlap between subbands.
  • 15. Polyphase Filter Bank  Comparison of filter banks and critical bands[1]:
  • 16. Polyphase Filter Bank  Frequency response of one subband[1] :
  • 18. The Weakness of the Human Ear  Frequency dependent resolution:  We do not have the ability to discern minute differences in frequency within the critical bands.  Auditory masking:  When two signals of very close frequency are both present, the louder will mask the softer.  A masked signal must be louder than some threshold for it to be heard  gives us room to introduce inaudible quantization noise.
  • 19. MPEG-I Psychoacoustic Models  MPEG-I standard defines two models:  Psychoacoustic Model 1:  Less computationally expensive  Makes some serious compromises in what it assumes a listener cannot hear  Psychoacoustic Model 2:  Provides more features suited for Layer III coding, assuming of course, increased processor bandwidth.
  • 20. Psychoacoustic Model  Convert samples to frequency domain  Use a Hann weighting and then a DFT  Simply gives an edge artifact (from finite window size) free frequency domain representation.  Model 1 uses 512 (Layer I) or 1024 (Layers II and III) sample window.  Model 2 uses a 1024 sample window and two calculations per frame.
  • 21. Psychoacoustic Model  Need to separate sound into “tones” and “noise” components  Model 1:  Local peaks are tones, lump remaining spectrum per critical band into noise at a representative frequency.  Model 2:  Calculate “tonality” index to determine likelihood of each spectral point being a tone  based on previous two analysis windows
  • 22. Psychoacoustic Model  “Smear” each signal within its critical band  Use either a masking (Model 1) or a spreading function (Model 2).  Adjust calculated threshold by incorporating a “quiet” mask – masking threshold for each frequency when no other frequencies are present.
  • 23. Psychoacoustic Model  Calculate a masking threshold for each subband in the polyphase filter bank  Model 1:  Selects minima of masking threshold values in range of each subband  Inaccurate at higher frequencies – recall how subbands are linearly distributed, critical bands are NOT!  Model 2:  If subband wider than critical band:  Use minimal masking threshold in subband  If critical band wider than subband:  Use average masking threshold in subband
  • 24. Psychoacoustic Model  The hard work is done – now, we just calculate the signal-to-mask ratio (SMR) per subband  SMR = signal energy / masking threshold  We pass our result on to the coding unit which can now produce a compressed bitstream
  • 26. Psychoacoustic Model (example)  Transformation to perceptual domain[1] :
  • 27. Psychoacoustic Model (example)  Calculation of masking thresholds[1] :
  • 28. Psychoacoustic Model (example)  Signal-to-mask ratios[1] :
  • 29. Psychoacoustic Model (example)  What we actually send[1] :
  • 31. Layer Specific Coding  Layer specific frame formats[1] :
  • 32. Layer Specific Coding  Stream of samples is processed in groups[1] :
  • 33. Layer I Coding  Group 12 samples from each subband and encode them in each frame (=384 samples)  Each group encoded with 0-15 bits/sample  Each group has 6-bit scale factor
  • 34. Layer II Coding  Similar to Layer I except:  Groups are now 3 of 12 samples per-subband = 1152 samples per frame  Can have up to 3 scale factors per subband to avoid audible distortion in special cases  Called scale factor selection information (SCFSI)
  • 35. Layer III Coding  Further subdivides subbands using Modified Discrete Cosine Transform (MDCT) – a lossless transform  Larger frequency resolution => smaller time resolution  possibility of pre-echo  Layer III encoder can detect and reduce pre-echo by “borrowing bits” from future encodings
  • 36. Bit Allocation  Determine number of bits to allot for each subband given SMR from psychoacoustic model.  Layers I and II:  Calculate mask-to-noise ratio:  MNR = SNR – SMR (in dB)  SNR given by MPEG-I standard (as function of quantization levels)  Now iterate until no bits to allocate left:  Allocate bits to subband with lowest MNR.  Re-calculate MNR for subband allocated more bits.
  • 37. Bit Allocation  Layer III:  Employs “noise allocation”  Quantizes each spectral value and employs Huffman coding  If Huffman encoding results in noise in excess of allowed distortion for a subband, encoder increases resolution on that subband  Whole process repeats until one of three specified stop conditions is met.
  • 39. Conclusions  MPEG-I provides tremendous compression for relatively cheap computation.  Not suitable for archival or audiophile grade music as very seasoned listeners can discern distortion.  Modifying or searching MPEG-I content requires decompression and is not cheap!
  • 40. Future Work  MPEG-1 audio lays the foundation for all modern audio compression techniques  Lots of progress since then (1994!)  MPEG-2 (1996) extends MPEG audio compression to support 5.1 channel audio  MPEG-4 (1998) attempts to code based on perceived audio objects in the stream  Finally, MPEG-7 (2001) operates at an even higher level of abstraction, focusing on meta-data coding to make content searchable and retrievable
  • 41. References [1] D. Pan, “A Tutorial on MPEG/Audio Compression”, IEEE Multimedia Journal, 1995. [2] J. H. Rothweiler, “Polyphase Quadrature Filters – a New Subband Coding Technique”, Proc of the Int. Conf. IEEE ASSP, 27.2, pp1280-1283, Boston 1983.