2. Introduction
Digital Audio Compression
Removal of redundant or otherwise irrelevant
information from audio signal
Audio compression algorithms are often referred to as
“audio encoders”
Applications
Reduces required storage space
Reduces required transmission bandwidth
2
3. Audio Compression
Audio signal – overview
Sampling rate (# of samples per second)
Bit rate (# of bits per second). Typically,
uncompressed stereo 16-bit 44.1KHz signal has a
1.4MBps bit rate
Number of channels (mono / stereo / multichannel)
Reduction by lowering those values or by data
compression / encoding
3
4. Audio Data Compression
Redundant information
Implicit
in the remaining information
Ex. oversampled audio signal
oversampling is the process of sampling a signal with a
sampling frequency significantly higher than twice the
bandwidth or highest frequency of the signal being sampled
Irrelevant information
Perceptuallyinsignificant
Cannot be recovered from remaining information
4
5. Audio Data Compression
Lossless Audio Compression
Removes redundant data
Resulting signal is same as original – perfect
reconstruction
Lossy Audio Encoding
Removes irrelevant data
Resulting signal is similar to original
5
6. Audio Data Compression
Audio vs. Speech Compression
Techniques
Speech Compression uses a human vocal
tract model to compress signals
Audio Compression does not use this
technique due to larger variety of possible
signal variations
6
7. Generic Audio Encoder
Psychoacoustic Model
Psychoacoustics – study of how sounds are
perceived by humans
Uses perceptual coding
eliminate information from audio signal that is
inaudible to the ear
Detectsconditions under which different audio
signal components mask each other
7
8. Psychoacoustic Model
Signal Masking
Threshold cut-off
Spectral (Frequency / Simultaneous) Masking
Temporal Masking
Threshold cut-off and spectral masking
occur in frequency domain, temporal
masking occurs in time domain
8
9. Signal Masking
Threshold cut-off
Hearing threshold
level – a function of
frequency
Any frequency
components below the
threshold will not be
perceived by human
ear
9
10. Signal Masking
Spectral Masking
A frequency
component can be
partly or fully masked
by another component
that is close to it in
frequency
This shifts the hearing
threshold
10
11. Signal Masking
Temporal Masking
A quieter sound can
be masked by a louder
sound if they are
temporally close
Sounds that occur
both (shortly) before
and after volume
increase can be
masked
11
12. Spectral Analysis
a device or algorithm that identifies a
frequency domain representation of a
time domain signal.
Tasks of Spectral Analysis
To derive masking thresholds to determine which
signal components can be eliminated
To generate a representation of the signal to which
masking thresholds can be applied
Spectral Analysis is done through transforms or
filter banks
12
13. Spectral Analysis
Transforms
Fast Fourier Transform (FFT)
Discrete Cosine Transform (DCT) - similar to
FFT but uses cosine values only
Modified Discrete Cosine Transform (MDCT)
[used by MPEG-1 Layer-III, MPEG-2 AAC,
Dolby AC-3] – overlapped and windowed
version of DCT
13
14. Spectral Analysis
Filter Banks
a filter bank is an array of band-pass filters that
separates the input signal into multiple
components, each one carrying a single
frequency subband of the original signal
Time sample blocks are passed through a set of
bandpass filters
Masking thresholds are applied to resulting frequency
subband signals
Poly-phase and wavelet banks are most popular filter
structures 14
15. Filter Bank Structures
Polyphase Filter Bank
[used in all of the MPEG-1 encoders]
Signal is separated into subbands, the widths
of which are equal over the entire frequency
range
The resulting subband signals are
downsampled to create shorter signals (which
are later reconstructed during decoding
process)
15
16. Filter Bank Structures
Wavelet Filter Bank
[used by Enhanced Perceptual Audio
Coder (EPAC) by Lucent]
Unlike polyphase filter, the widths of the
subbands are not evenly spaced (narrower for
higher frequencies)
This allows for better time resolution (ex. short
attacks), but at expense of frequency
resolution
16
17. Noise Allocation
System Task: derive and apply shifted hearing
threshold to the input signal
Anything below the threshold doesn’t need to be
transmitted
Any noise below the threshold is irrelevant
Frequency component quantization
Tradeoff between space and noise
Encoder saves on space by using just enough bits for
each frequency component to keep noise under the
threshold - this is known as noise allocation
17
18. Noise Allocation
Pre-echo
In case a single audio block contains silence followed
by a loud attack, pre-echo error occurs - there will be
audible noise in the silent part of the block after
decoding
This is avoided by pre-monitoring audio data at
encoding stage and separating audio into shorter
blocks in potential pre-echo case
This does not completely eliminate pre-echo, but can
make it short enough to be masked by the attack
(temporal masking)
18
19. Additional Encoding Techniques
Other encoding techniques techniques are
available (alternative or in combination)
Predictive Coding
Coupling / Delta Encoding
Huffman Encoding
19
20. Additional Encoding Techniques
Predictive Coding
Often used in speech and image compression
Estimates the expected value for each sample based
on previous sample values
Transmits/stores the difference between the expected
and received value
Generates an estimate for the next sample and then
adjusts it by the difference stored for the current
sample
Used for additional compression in MPEG2 AAC
(Advance audio Coding)
20
21. Additional Encoding Techniques
Coupling / Delta encoding
Used in cases where audio signal consists of two or
more channels (stereo or surround sound)
Similarities between channels are used for
compression
A sum and difference between two channels are
derived; difference is usually some value close to zero
and therefore requires less space to encode
This is a case of lossless encoding process
21
22. Additional Encoding Techniques
Huffman Coding
Information-theory-based technique
An element of a signal that often reoccurs in the
signal is represented by a simpler symbol, and its
value is stored in a look-up table
Implemented using a look-up tables in encoder and in
decoder
Provides substantial lossless compression, but
requires high computational power and therefore is
not very popular
Used by MPEG1 and MPEG2 AAC
22
23. Encoding - Final Stages
Audio data packed into frames
Frames stored or transmitted
23
Hello, Today I will talk about the common techniques commonly used for digital audio compression of various audio filetype formats.
-I will discuss the difference between redundant and irrelevant further in my presentation. -Depending on storage or transmission, there is an optimization in size