SlideShare une entreprise Scribd logo
1  sur  81
Télécharger pour lire hors ligne
Audio Compression
Introduction
 Audio signal or analog signal uses PCM Digitization process which
  involves SAMPLING.
 Sampling rate > or = to : 2(Highest frequency component).
 Band-limited Signal: When the BW of comm. Channel to be used is less
  than minimum sampling rate then signal needs to be bandlimited.
 Speech Signal:(15Hz-10kHz)
   Max. freq. component is 10kHz
   Minimum Sampling rate: 2x10=20 ksps
   Bits per sample=12bits per sample
   Bit rate used: (Sampling rate X Bits per sample) =240 kbps
 General Audio Signal:(50Hz-20kHz)
    Max. freq. component is 20kHz
    Minimum Sampling rate: 2x20=40 ksps
    Bits per sample=16 bits per sample
    Bit rate used: 1.28 Mbps
How the concept of Audio Compression
 comes?
 In most MM applications, BW of communication channel that are
  available does not support such high bit rates of 240kbps and 1.28Mbps
  but offers less bit rates…. 
 So what is the solution?????? There are two solutions…they are………
 Solution 1:Audio signal is sampled at lower rate! (BAD ONE) 
   Merit:          Simple to implement
   Demerit: 1.Quality of decoded signal is reduced resulting in loss of
                                     HF components from orignal signal
                  2. Use of few bps results in high QN
 Solution 2: Compression Algorithm can be used! (GOOD ONE) 
   Give good perceptual quality
   Reduced BW requirement
 Further discussion is on Audio Compression Methods……
1. Differential Pulse Code Modulation
(DPCM)
 Differential pulse code modulation is a derivative of the standard
  PCM
 It uses the fact that the range of differences in amplitudes between
  successive samples of the audio waveform is less than the range of
  the actual sample amplitudes
 Hence fewer bits are required to represent the difference signals
  than in case of PCM for the same sampling rate.
 It reduces the bit rate requirements from 64kbps to 56kbps.
DPCM Principles
Operation of DPCM:
Encoder
 Previously digitized sample is held in the register (R)
 The DPCM signal is computed by subtracting the current contents (Ro)
  from the new output by the ADC (PCM)
 The register value is then updated before transmission
 DPCM=PCM-R0
Decoder
 Decoder simply adds the previous register contents (PCM) with the
  DPCM
 R1=R0+DPCM


Limitation of DPCM:
 ADC operations introduces quantization errors each time and will
  introduce cumulative errors in the value stored in the register(R).
 So previous value (R) is only approximation!!!!!!!! ...We really need
  more accurate version of previous signal that we got in................
2. Third Order Predictive DPCM
 To eliminate this noise effect predictive methods are used to predict a
  more accurate version of the previous signal (use not only the current
  signal but also varying proportions of a number of the preceding
  estimated signals)
 These proportions used are known as predictor coefficients
 Difference signal is computed by subtracting varying proportions of the
  last three predicted values from the current output by the ADC.
 It reduces the bit rate requirements from 64kbps to 32kbps.
Third-order predictive DPCM signal encoder and
                     decoder
Operation of Third Order Predictive
 DPCM
 R1, R2, R3 will be subtracted from PCM
 The values in the R1 register will be transferred to R2 and R2 to R3 and the
  new predicted value goes into R1
 Decoder operates in a similar way by adding the same proportions of the
  last three computed PCM signals to the received DPCM signal
3. Adaptive differential PCM
(ADPCM)
 FirstADPCM International               standard     is   defined    in   ITU-T
  Recommendation G.721
   Savings of bandwidth is possible by varying the number of bits used for
    difference signal depending on its amplitude (fewer bits to encode smaller
    difference signals)
   Based on the same principle as the DPCM except an eight-order predictor is
    used and the number of bits used to quantize each difference is varied
   This can be either 6 bits – producing 32 kbps – to obtain a better quality
    output than with third order DPCM, or 5 bits- producing 16 kbps – if lower
    bandwidth is more important

 Second ADPCM International standard is defined in ITU-T
  Recommendation G.722
   Better sound quality at the cost of added complexity.
   Input speech BW is extended from 50-7kHz compared with 3.4kHz for a
    standard PCM system
   Wider BW give rise to high quality........as you need in video conferencing..
 This uses Subband coding
 In this coding input signal prior to sampling is passed through two
    filters:
      One passes only signal frequencies in the range 50Hz - 3.5kHz and
      Other only frequencies in the range 3.5kHz - 7kHz
   By doing this the input signal is effectively divided into two separate
    equal-bandwidth signals,
      first known as the lower subband signal and,
      second the upper subband signal
   Each is then sampled and encoded independently using ADPCM.
   The use of two subbands has the advantage that different bit rates can
    be used for each.
   The two bitstreams are multiplexed to produce the transmitted signal
    – in such a way that, decoder in the receiver is able to divide them
    back again into two separate streams for decoding.
   Operating bit rates are 64,56, or 48kbps.
ADPCM subband encoder and decoder schematic
4. Adaptive predictive coding
 Even higher levels of compression possible at higher levels of
    complexity
   These can be obtained by also making predictor coefficients
    adaptive
   In practice, the optimum set of predictor coefficients continuously
    vary since they are a function of the characteristics of the audio signal
    being digitized
   The optimum set of coefficients are then computed and these are
    used to predict more accurately the previous signal
   This type of compression can reduce the bandwidth requirements to
    8kbps while still obtaining an acceptable perceived quality
5.Linear predictive coding
 With this coding Perceptual Features of an audio waveform are
  analysed by the source first.
 These are then quantized and sent and the destination uses them,
  together with a sound synthesizer, to regenerate a sound that is
  perceptually comparable with the source audio signal
 With this compression technique, although the generated speech can
  often be sound synthetic, but very high levels of compressions can be
  achieved.
 Now, what are those perceptual features.....need to be
  analyzed..........?????
 In terms of speech, Three features which determine the perception of
  a signal by the ear are its:
   Pitch: This is closely related to the frequency of the signal. This is important
    since ear is more sensitive to signals in the range 2-5kHz
   Period: This is the duration of the signal
   Loudness: This is determined by the amount of energy in the signal


 In addition, orign of sound is also important. These are called vocal
  tract excitation parameters
   Voiced sound: These are generated through vocal chords. e.g letters m,v,
    and i.
   Unvoiced sound: With these vocal chords are open. e.g letters f and s.
Operation of LPC encoder and decoder
    ENCODER:
   The input speech waveform is first sampled and quantized at a defined
    rate.
   A block of digitized samples – known as segment - is then analysed to
    determine the various perceptual parameters of the speech that it
    contains.
   The output of the encoder is a string of frames, one for each segment
   Each frame contains:
       fields for pitch and loudness
       Period determined by the sampling rate being used
       Notification of whether the signal is voiced or unvoiced
       Set of computed modal coefficients
 Some LPC encoders uses up to 10 set of previous model coefficients to
    predict the output sound called LPC-10 and uses bit rates as low as
    2.4kbps-1.2 kbps.
Operation of LPC encoder and decoder
 cont.. ..........
  DECODER
 Speech signal generated by vocal tract model in the decoder is a function
  of the:
   Present output of speech synthesizer (as determined by the current state of
    model coefficients).
   Plus a linear combination of previous set of model coefficients.


  APPLICATION
 Generated sound at this low rate is very synthetic and so LPC encoders
  are used primarily in Military Applications where BW is all important.
Linear predictive coding (LPC) signal encoder and decoder
THANKS…..
6. Code-excited LPC (CELPC)
 The synthesiser used in most LPC decoders are based on a very basic
    model of the vocal tract
   These are intended for use with applications in which the amount of
    bandwidth available is limited but the perceived quality of the speech
    must be of acceptable standard for use in various multimedia
    applications
   In CELPC model instead of treating each digitized segment
    independently for encoding purposes, just a limited set of segments are
    used, each known as a wave template
   A pre computed set of templates are held by the encoder and the
    decoder in what is known as the template codebook
   Each of the individual digitized samples that make up a particular
    template in the codebook are differently encoded
 All coders of this type have a delay associated with them which is
    incurred while each block of digitized samples is analysed by the encoder
    and the speech is reconstructed at the decoder
   The combined delay value is known as the coder’s processing delay
   In addition before the speech samples can be analysed it is necessary to
    buffer the block of samples
   The time to accumulate the block of samples is known as the algorithmic
    delay
   The coders delay an important parameter in conventional telephony
    application, a low-delay coder is required whereas in an interactive
    application delay of several seconds before the speech starts is acceptable
Perceptual Coding (PC)
 LPC and CELP are used only for telephony applications and hence
    compression of speech signal.
   PC are designed for compression of general audio such as that associated
    with a digital television broadcast.
   Use a psychoacoustic model (this exploits a number of limitations of
    human ear).
   Using this approach, sampled segments of the source audio waveform
    are analysed – but only those features that are perceptible to the ear are
    transmitted.
   E.g although the human ear is sensitive to signals in the range 15Hz to 20
    kHz, the level of sensitivity to each signal is non-linear; that is the ear is
    more sensitive to some signals than others.
   WHAT              IS     THAT          LIMITATION       OF        HUMAN
    EARS..................??????
     ................. MASKING.........EFFECT 
 Frequency Masking: When multiple signals are present in audio, a
  strong signal may reduce the level of sensitivity of the ear to other
  signals which are near to it in frequency.
 Temporal masking: When the ear hears a loud sound it takes a short
  but a finite time before it could hear a quieter sound.
 Psychoacoustic Model is used to identify those signals which are
  influenced by masking and these are then eliminated from the
  transmitted signal........and hence compression is achieved ...
Sensitivity of the ear:
 The dynamic range of ear is defined as the loudest sound it can hear to
    the quietest sound
   Sensitivity of the ear varies with the frequency of the signal as
    shown....in next slide.
   The ear is most sensitive to signals in the range 2-5kHz hence the signals
    in this band are the quietest the ear is sensitive to.
   Vertical axis gives all the other signal amplitudes relative to this signal
    (2-5 kHz).
   In the fig. although the Signal A & B have same relative amplitude, signal
    A would be heard only because it is above the hearing threshold and B is
    below the hearing threshold.
Sensitivity of the ear varies with the
           frequency as....
Frequency Masking
When an audio sound consists of multiple frequency signals is
present, the sensitivity of the ear changes and varies with the
relative amplitude of the signal
Conclusions from diagram:
 Signal B is larger than signal A. This causes the basic sensitivity curve of
  the ear to be distorted in the region of signal B
 Signal A will no longer be heard as it is within the distortion band.


 Variation of frequency masking effect with frequency:
 Masking effect at various frequencies 1, 4, and 8kHz are shown as:
 Width of masking curve (means range of frequencies that are affected)
  increases with increasing frequency.
 The width of each curve at a particular signal level is known as the
  critical bandwidth for that frequency.
 For frequencies greater than 500Hz critical bandwidth increases linearly
  in multiples of 100Hz.
Variation of frequency masking effect with
frequency
Temporal masking
 After the ear hears a loud sound it takes a further short time before it can
    hear a quieter sound.
   This is known as the temporal masking.
   After the loud sound ceases it takes a short period of time for the signal
    amplitude to decay.
   During this time, signals whose amplitudes are less than the decay
    envelope will not be heard and hence need not be transmitted.
   In order to exploit this phenomenon, the input audio waveform must be
    processed over a time period that is comparable with that associated with
    temporal masking.
Temporal masking caused by loud signal
Audio Compression – MPEG Audio
             coder

MOTION PICTURE EXPERT GROUP was formed by the
ISO to formulate a set of standards relating to a range of
Multimedia applications that involves the use of video with
sound. The coder associated with Audio Compression form a
part of these standards are known as MPEG audio coders
Why Do We Need International
 Standards?
 International standardization is conducted to achieve
  inter-operability .
    Only syntax and decoder are specified.
    Encoder is not standardized and its optimization is left
     to the manufacturer.
 Standards provide state-of-the-art technology that is
  developed by a group of experts in the field.
    Not only solve current problems, but also anticipate
     the future application requirements.
MPEG AUDIO CODER
FORWARD ADAPTIVE BIT ALLOCATION
MODE
MPEG audio coder
 The audio input signal is first sampled and quantized using PCM.
 The bandwidth available for transmission is divided into a number of
  frequency subbands using a bank of analysis filters.
 Analysis filter bank:
   Maps each set of 32 (time related) PCM samples into an equivalent set of 32
    frequency samples.
   Determines the peak amplitude in each subband (consisting of 12 freq.
    components) called scaling factor.
 Processing associated with both frequency and temporal masking is carried
  out by the psychoacoustic model.
 In basic encoder the time duration of each sampled segment of the audio
  input signal is equal to the time to accumulate 12 successive sets of 32
  PCM.
 12 sets of 32 PCM time samples are converted into frequency components
  using DFT.
 The output of the psychoacoustic model is a set of what are known as
  signal-to-mask ratios (SMRs) and indicate the frequency components
  whose amplitude is below the audible threshold.
 This is done to have more bits for highest sensitivity regions compared
  with less sensitive regions.
 In an encoder all the frequency components are carried in a frame.
Frame Format:




 HEADER: contains information such as the sampling frequency that has
  been used
 SBS:The peak amplitude level in each subband is first quantized using 6
  bits and a further 4 bits are then used to quantize the 12 frequency
  components in the subband relative to this level. Collectively this is called
  Subband Sample format.
 Ancillary data field: at the end of the frame optional.
   for example: used to carry additional coded samples associated with the
  surround-sound that is present with some digital video broadcasts.
 At the decoder section the de-quantizers will determine the magnitude of
  each signal
 The synthesis filters will produce the PCM samples at the decoders


   Various Parameters associated with Encoder
 Sampling rate used : 32ksps
 Max. Signal freq. Component: 16khz so each subband has BW=500Hz.
 12 successive set of 32 PCM are used having:
   Time duration = (12X32)=384 PCM samples
Summary of MPEG layer 1,2 and 3 Perceptual
Encoders
  Layer              Application               Compressed bit rate



   1            Digital Audio cassette            32-448kbps

               Digital Audio and Video
   2                 broadcasting                 32-192kbps

          CD-quality audio over low bit rate
   3                                                64kbps
                      channels
VIDEO COMPRESSION
What is VIDEO ?
 VIDEO is simply a sequence of digitized pictures, video is also
  referred to as moving pictures and the terms “frames” and “picture” are
  used interchangeably.

 APPLICATION:
   Interpersonal: Video Telephony & Video Conferencing
   Interactive: access to stored video in various forms
   Entertainment: Digital TV & MOD/VOD


 Problem with uncompressed Video:
   Raw video contains an immense amount of data
   Communication and storage capabilities are limited and expensive.
Definitions related to VIDEO:
 Bit-rate
   Information stored/transmitted per unit time
   Usually measured in Mbps (Megabits per second)
 Resolution
   Number of pixels per frame
   Ranges from 160x120 to 1920x1080
 FPS (frames per second)
   Usually 24, 25, 30, 50 or 60
   Don’t need more because of limitations of the human eye
Video Compression: Why?
 Bandwidth Reduction………………….

      Application               Data Rate
                        Uncompressed   Compressed
 Video Conference
 352 X 240                30.4 Mbps    64 - 768 kbps
 CD-ROM Digital Video
 352 X 240                60.8 Mbps    1.5 - 4 Mbps
 Broadcast Video
 720 X 480               248.8 Mbps     3 - 8 Mbps
 HDTV
 1280 X 720               1.33 Gbps      20 Mbps
Video Compression Standards:
STANDARD   APPLICATION                       BIT RATE
JPEG       Continuous-tone still-image       Variable
           compression

H.261      Video telephony and               p x 64 kb/s
           teleconferencing over ISDN

MPEG-1     Video on digital storage media    1.5 Mb/s
           (CD-ROM)

MPEG-2     Digital Television                > 2 Mb/s
H.263      Video telephony over PSTN         < 33.6 kb/s
MPEG-4     Object-based coding, synthetic    Variable
           content, interactivity


H.264      From Low bitrate coding to HD     Variable
           encoding, HD-DVD, Surveillance,
           Video conferencing.
Video Compression Principles:
Spatial Redundancy
 Take advantage of similarity among most neighboring pixels
 Occur inside frame
Temporal Redundancy
 Take advantage of similarity between successive frames
 Is measured in between the frames: measure ME & MC




       950                  951                952
Motion Estimation (ME): To measure movement between successive
frames.
Motion Compensation (MC): This is the additional information that
must be sent to indicate any small differences between the predicted and
actual positions of the moving segments involved
TYPES OF FRAME :
Intracoded (I-Frames)
 I-frames (Intracoded frames) are encoded without reference to any
    other frames.
   Each frame is treated as a separate picture and the Y, Cr and Cb
    matrices are encoded separately using JPEG.................in next
    slide........
   I–frames the compression level is small
   They are good for the first frame relating to a new scene in a movie
   I-frames must be repeated at regular intervals to avoid losing the whole
    picture as during transmission it can get corrupted and hence looses the
    frame
   The number of frames/pictures between successive I-frames is known
    as a group of pictures (GOP). Typical values of GOP are N=3 - 12
Audio and video compression
Encoding of I-Frame:
 RGB to YUV
   less information required for YUV (humans less sensitive to
      chrominance)
   Macro Blocks
     Take groups of pixels (16x16)
   Discrete Cosine Transformation (DCT)
     Based on Fourier analysis where represent signal as sum of
      sine's and cosine’s
     Concentrates on higher-frequency values
     Represent pixels in blocks with fewer numbers
   Quantization
     Reduce data required for co-efficients
   Entropy coding
     Compress
Encoding of I-Frame cont….




                             Zig-Zag Scan,
      Quantization            Run-length
      • major reduction         coding
      • controls ‘quality’
Predictive Frame (P-frame)
 The encoding of the P-frame is relative to the contents of either a
    preceding I-frame or a preceding P-frame
   P-frames are encoded using a combination of motion estimation and motion
    compensation
   The accuracy of the prediction operation is determined by how well any
    movement between successive frames is estimated. This is known as the
    motion estimation
   Since the estimation is not exact, additional information must also be sent to
    indicate any small differences between the predicted and actual positions
    of the moving segments involved. This is known as the motion
    compensation
   No of P frames between I-frames is limited to avoid error propagation
    (since any error present in the first P-frame will be propagated to the next)
 No. Of frames between a P-Frame and immediately preceding I-or-P
    Frame is called prediction span(M)
Frame Sequences I-, P- and B-frames
Bi-directional Frame (B-frame)
 For fast moving video e.g movies, B-frames (Bi-directional) are
  used. Their contents are predicted using the past and the future frames.
 B-frame is encoded relative to the preceding as well as the succeeding
  I & P frame.
 B-frame results in encoding delay because time needed to wait for the
  next I or P frame in the sequence.
 B- frames provides highest level of compression and because they are
  not involved in the coding of other frames they do not propagate
  errors.
PB-Frames




PB-frame: It does not refer to a new frame type as such but rather
the way two neighbouring P- and B-frames are encoded as if they were
a single frame
D-frame
 This is application specific used in MOD/VOD applications.
 In these application user wish for fast forward or rewind through the
  movie, this requires the compressed video to be decompressed at a
  much higher rate. To support this function encoded bit stream also
  contains D-frame.
Motion Estimation & Motion Compensation
(Encoding of P & B frame)
 Motion estimation involves comparing small segments of two consecutive
  frames for differences, and as difference is detected a search is carried out
  to determine which neighbouring segments the original segment has
  moved.
 To limit the time for search the comparison is limited to few segments
 P-Frame: We will estimate the motion that has taken place between the
  frame being encoded and preceding I or P frame (in case of P frame)
 B-Frame: We will estimate the motion that has taken place between the
  frame being encoded and preceding I or P frame as well as succeeding I
  or P frame. (in case of B frame).
P-frame encoding




The digitized contents of the Y matrix associated with each frame are first
divided into a two-dimensional matrix of 16 X 16 pixels known as a
MACROBLOCK
 MB consists of :
   4 DCT blocks (8X8) for the luminance signals
   1 DCT block each for the two chrominance signals (Cb and Cr).
 Each MB has an address associated with it.
 To encode a p-frame the contents of each macroblock in the frame –
  known as the target frame are compared on a pixel-by-pixel basis with the
  contents of the preceding I or P frames (reference frames)

                 I or P                         P
            Reference Frame                Target Frame

 SEARCH........SEARCH.........SEARCH..............O/P may be...:-
   If a close match is found then only the address of the MB is encoded
   If a match is not found the search is extended to cover an area around the MB
    in the reference frame.
 All the possible MB in the selected search area in reference frame
  are searched for a match………………………..
 Case 1:if a close match is found then two parameters are
  encoded:
   Motion Vector(V): It indicates the (x,y) offset of the MB encoded. It is
    further encoded by differential encoding
   Prediction Error: It consists of three matrices (one each for Y, Cb, Cr)
    each of which contains the difference values between those in Target MB
    and set of pixels in the search area in the Reference frame that produced the
    closed match. This is encoded by same method as used for I frame
 Case 2: If a match is not found e.g if the moving object is
  moved out of the extended search area
   MB is encoded independently in the same way as MBs in the I frame.
Match is said to be found if the mean of absolute errors in all the pixel
positions in the difference Difference MB (MD) is less than a given
threshold.
Audio and video compression
B-frame encoding

 To encode a B-frame, any motion is estimated with reference to both the
  immediately preceding I- or P-frame and the immediately succeeding P-
  or I-frame.
 The parameters motion vector and prediction error (difference matrices)
  which are computed using:
   first the preceding frame as reference and
   then succeeding frame as reference.
 A third motion vector and set of difference matrices are then computed
  using the target and the mean of the other two predicted set of values
  (MD and MD’).
Audio and video compression
Decoding of I, P, and B frames:
    I-frames :
     decode immediately to recreate original frame
    P-frames:
     The received information is decoded and the resulting information is
     used with the decoded contents of the preceding I/P frames (two
     buffers are used)
    B-frames:
     The received information is decoded and the resulting information is
     used with the decoded contents of the preceding and succeeding P or
     I frame (three buffers are used)
Implementation schematic – I-frames




 The encoding procedure used for the macroblocks that make up an I-
  frame is the same as that used in the JPEG standard to encode each 8 x 8
  block of pixels.
 Implementation Issues:
       I-frame same as JPEG implementation
       FDCT, Quantization, entropy encoding
       Assuming 4 blocks for the luminance and 2 blocks for the chrominance,
        each macroblock (MB) would require six 8x8 pixel blocks to be encoded
Implementation Schematic- P-frames
       In the case of P-frames, encoding of each macroblock is dependent on
        output of the motion estimation (ME) unit which, in turn, depends
        on the contents of the MB (target frame) being encoded and the
        contents of the macroblock in the search area (reference frame) that
        produces the closest match. There are three possibilities:

         If the two contents are the same, only the address of the macroblock
          in the reference frame is encoded
         If the two contents are very close, both the motion vector and the
          difference matrices associated with the macroblock in the reference
          frame are encoded
         If no close match is found, then the target macroblock is encoded in
          the same way as a macroblock in an I-frame
In order to carry out its role, the motion estimation unit
containing the search logic, utilizes a copy of the (uncoded)
reference frame
Implementation schematic – B-frames




The same previous procedure is followed for encoding B-
frames except both the preceding (reference) and the
succeeding frame to the target frame are involved
Macroblock encoded bit-stream format–




For each macroblock it is necessary to identify the type of encoding that has
                    been used. This is the role of the formatter.
 Type – indicates the type of frame encoded I, P or B
 Address – identifies the location of the macroblock in the frame
 Quantization Value – is the value used to quantize all the DCT
     coefficients in the macroblock
     Motion vector – encoded vector
 Block representation – indicates which of the six 8X8 blocks that make
     up the macroblcok are present
     B1, B2, ..B6: JPEG encoded DCT coefficients for those blocks present
MPEG (Moving Pictures Expert Group)
 Committee of experts that develops video encoding standards
  in the year 1990.
 Until recently, was the only game in town (still the most
  popular, by far)
 Suitable for wide range of videos
   Low resolution to high resolution
   Slow movement to fast action
 Can be implemented either in software or hardware
MPEG:
       MPEG-1 ISO Recommendation 11172
        Source intermediate digitization format (SIF) is used.
        Uses resolution of 352x288 pixels and used for VHS quality audio and video
         on CD-ROM at a bit rate of 1.5 Mbps
       MPEG-2 ISO Recommendation 13818
        Used in recording and transmission of studio quality audio and video.
        Different levels of video resolution possible
          Low: 352X288 comparable with MPEG-1
          Main: 720X 576 pixels studio quality video and audio, bit rate up to
                 15 Mbps
          High: 1920X1152 pixels used in wide screen HDTV bit rate of up to
                 80Mbps are possible
       MPEG-4: Used for interactive multimedia applications over the
        Internet and over various entertainment networks
         MPEG standard contains features to enable a user not only to passively
          access a video sequence using for example the start/stop/ but also enables
          the manipulation of the individual elements that make up a scene within a
          video
         In MPEG-4 each video frame is segmented into a number of video
          object planes (VOP) each of which will correspond to an AVO (Audio
          visual object) of interest.
MPEG-1




• Uses a similar video compression technique as H.261; the
digitization format used is the source intermediate format
(SIF) and progressive scanning with a refresh rate of 0 Hz
(NTSC) and 25 Hz (for PAL)
Performance
   Compression for I-frames are similar to JPEG for Video typically
    10:1 through to 20:1 depending on the complexity of the frame
    contents
   P and B frames are higher compression and in the region of 20:1
    through to 30:1 for P frame and 30:1 to 50:1 for B-frames
Video Compression – MPEG-1 video
    bitstream structure: composition




• The compressed bitstream produced by the video encoder is
hierarchical: at the top level, the complete compressed video
(sequence) which consists of a string of groups of pictures
Video Compression – MPEG-1 video
    bitstream structure: format




• In order for the decoder to decompress the received
bitstream, each data structure must be clearly identified within
the bitstream
Video Compression – MPEG-4 coding
    principles




• Content based video coding principles showing how a frame/
scene is defined in the form of multiple video object planes
Video Compression – MPEG – 4
    encoder/decoder schematic




• Before being compressed each scene is defined in the form
of a background and one or more foreground audio-visual
objects (AVOs)
Video Compression – MPEG VOP encoder




The audio associated with an AVO is compressed using one of
the algorithms described before and depends on the available
bit rate of the transmission channel and the sound quality
required

Contenu connexe

Tendances

Audio compression 1
Audio compression 1Audio compression 1
Audio compression 1Rajat Kumar
 
MPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video EncodingMPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video EncodingChristian Kehl
 
Multimedia basic video compression techniques
Multimedia basic video compression techniquesMultimedia basic video compression techniques
Multimedia basic video compression techniquesMazin Alwaaly
 
Introduction to Video Compression Techniques - Anurag Jain
Introduction to Video Compression Techniques - Anurag JainIntroduction to Video Compression Techniques - Anurag Jain
Introduction to Video Compression Techniques - Anurag JainVideoguy
 
Multimedia Networking
Multimedia NetworkingMultimedia Networking
Multimedia NetworkingAhmed Gad
 
Audio compression
Audio compression Audio compression
Audio compression Darshan IT
 
Introduction to HEVC
Introduction to HEVCIntroduction to HEVC
Introduction to HEVCYoss Cohen
 
Video Compression Basics - MPEG2
Video Compression Basics - MPEG2Video Compression Basics - MPEG2
Video Compression Basics - MPEG2VijayKumarArya
 

Tendances (20)

Audio compression 1
Audio compression 1Audio compression 1
Audio compression 1
 
Audio compression
Audio compressionAudio compression
Audio compression
 
MPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video EncodingMPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video Encoding
 
Multimedia basic video compression techniques
Multimedia basic video compression techniquesMultimedia basic video compression techniques
Multimedia basic video compression techniques
 
Compression
CompressionCompression
Compression
 
Introduction to Video Compression Techniques - Anurag Jain
Introduction to Video Compression Techniques - Anurag JainIntroduction to Video Compression Techniques - Anurag Jain
Introduction to Video Compression Techniques - Anurag Jain
 
Text compression
Text compressionText compression
Text compression
 
SPEECH CODING
SPEECH CODINGSPEECH CODING
SPEECH CODING
 
Multimedia Networking
Multimedia NetworkingMultimedia Networking
Multimedia Networking
 
Audio compression
Audio compression Audio compression
Audio compression
 
Huffman Coding
Huffman CodingHuffman Coding
Huffman Coding
 
MPEG 4
MPEG 4MPEG 4
MPEG 4
 
Introduction to HEVC
Introduction to HEVCIntroduction to HEVC
Introduction to HEVC
 
Multimedia communication networks
Multimedia communication networksMultimedia communication networks
Multimedia communication networks
 
Audio and Video Compression
Audio and Video CompressionAudio and Video Compression
Audio and Video Compression
 
Video Compression Basics - MPEG2
Video Compression Basics - MPEG2Video Compression Basics - MPEG2
Video Compression Basics - MPEG2
 
Lzw
LzwLzw
Lzw
 
Video Compression
Video CompressionVideo Compression
Video Compression
 
Multimedia Network
Multimedia NetworkMultimedia Network
Multimedia Network
 
Lzw coding technique for image compression
Lzw coding technique for image compressionLzw coding technique for image compression
Lzw coding technique for image compression
 

Similaire à Audio and video compression (20)

Speech encoding techniques
Speech encoding techniquesSpeech encoding techniques
Speech encoding techniques
 
Digital modulation
Digital modulationDigital modulation
Digital modulation
 
TeleCom Lecture 07.ppt
TeleCom Lecture 07.pptTeleCom Lecture 07.ppt
TeleCom Lecture 07.ppt
 
Ijetr021253
Ijetr021253Ijetr021253
Ijetr021253
 
digital communication.pdf
digital communication.pdfdigital communication.pdf
digital communication.pdf
 
Digital audio
Digital audioDigital audio
Digital audio
 
Low power fpga solution for dab audio decoder
Low power fpga solution for dab audio decoderLow power fpga solution for dab audio decoder
Low power fpga solution for dab audio decoder
 
Pulse code modulation tutorialspoint
Pulse code modulation   tutorialspointPulse code modulation   tutorialspoint
Pulse code modulation tutorialspoint
 
Pcm transmitter and receiver
Pcm transmitter and receiverPcm transmitter and receiver
Pcm transmitter and receiver
 
Unit i-pcm-vsh
Unit i-pcm-vshUnit i-pcm-vsh
Unit i-pcm-vsh
 
Pulse code modulation and Demodulation
Pulse code modulation and DemodulationPulse code modulation and Demodulation
Pulse code modulation and Demodulation
 
Multimedia seminar ppt
Multimedia seminar pptMultimedia seminar ppt
Multimedia seminar ppt
 
Presentation ct
Presentation ctPresentation ct
Presentation ct
 
Speech compression-using-gsm
Speech compression-using-gsmSpeech compression-using-gsm
Speech compression-using-gsm
 
Base band transmission
Base band transmissionBase band transmission
Base band transmission
 
Base band transmission
Base band transmissionBase band transmission
Base band transmission
 
I010435659
I010435659I010435659
I010435659
 
311 pulse modulation
311 pulse modulation311 pulse modulation
311 pulse modulation
 
05 signal encodingtechniques
05 signal encodingtechniques05 signal encodingtechniques
05 signal encodingtechniques
 
Pulse code modulation
Pulse code modulationPulse code modulation
Pulse code modulation
 

Dernier

2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptxSandy Millin
 
In - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxIn - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxAditiChauhan701637
 
CapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapitolTechU
 
Prescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxPrescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxraviapr7
 
UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024UKCGE
 
How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17Celine George
 
How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17Celine George
 
Education and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxEducation and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxraviapr7
 
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRADUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRATanmoy Mishra
 
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...Nguyen Thanh Tu Collection
 
What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?TechSoup
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxDr. Asif Anas
 
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptxPISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptxEduSkills OECD
 
Practical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxPractical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxKatherine Villaluna
 
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptxClinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptxraviapr7
 
Quality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICEQuality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICESayali Powar
 
M-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptxM-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptxDr. Santhosh Kumar. N
 
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfMaximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfTechSoup
 

Dernier (20)

2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
 
Finals of Kant get Marx 2.0 : a general politics quiz
Finals of Kant get Marx 2.0 : a general politics quizFinals of Kant get Marx 2.0 : a general politics quiz
Finals of Kant get Marx 2.0 : a general politics quiz
 
In - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxIn - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptx
 
CapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptx
 
Prescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxPrescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptx
 
UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024
 
How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17
 
How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17
 
Education and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxEducation and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptx
 
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRADUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
 
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
 
What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?
 
Personal Resilience in Project Management 2 - TV Edit 1a.pdf
Personal Resilience in Project Management 2 - TV Edit 1a.pdfPersonal Resilience in Project Management 2 - TV Edit 1a.pdf
Personal Resilience in Project Management 2 - TV Edit 1a.pdf
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptx
 
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptxPISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
 
Practical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxPractical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptx
 
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptxClinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
 
Quality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICEQuality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICE
 
M-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptxM-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptx
 
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfMaximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
 

Audio and video compression

  • 2. Introduction  Audio signal or analog signal uses PCM Digitization process which involves SAMPLING.  Sampling rate > or = to : 2(Highest frequency component).  Band-limited Signal: When the BW of comm. Channel to be used is less than minimum sampling rate then signal needs to be bandlimited.  Speech Signal:(15Hz-10kHz)  Max. freq. component is 10kHz  Minimum Sampling rate: 2x10=20 ksps  Bits per sample=12bits per sample  Bit rate used: (Sampling rate X Bits per sample) =240 kbps  General Audio Signal:(50Hz-20kHz)  Max. freq. component is 20kHz  Minimum Sampling rate: 2x20=40 ksps  Bits per sample=16 bits per sample  Bit rate used: 1.28 Mbps
  • 3. How the concept of Audio Compression comes?  In most MM applications, BW of communication channel that are available does not support such high bit rates of 240kbps and 1.28Mbps but offers less bit rates….   So what is the solution?????? There are two solutions…they are………  Solution 1:Audio signal is sampled at lower rate! (BAD ONE)   Merit: Simple to implement  Demerit: 1.Quality of decoded signal is reduced resulting in loss of HF components from orignal signal 2. Use of few bps results in high QN  Solution 2: Compression Algorithm can be used! (GOOD ONE)   Give good perceptual quality  Reduced BW requirement  Further discussion is on Audio Compression Methods……
  • 4. 1. Differential Pulse Code Modulation (DPCM)  Differential pulse code modulation is a derivative of the standard PCM  It uses the fact that the range of differences in amplitudes between successive samples of the audio waveform is less than the range of the actual sample amplitudes  Hence fewer bits are required to represent the difference signals than in case of PCM for the same sampling rate.  It reduces the bit rate requirements from 64kbps to 56kbps.
  • 6. Operation of DPCM: Encoder  Previously digitized sample is held in the register (R)  The DPCM signal is computed by subtracting the current contents (Ro) from the new output by the ADC (PCM)  The register value is then updated before transmission  DPCM=PCM-R0 Decoder  Decoder simply adds the previous register contents (PCM) with the DPCM  R1=R0+DPCM Limitation of DPCM:  ADC operations introduces quantization errors each time and will introduce cumulative errors in the value stored in the register(R).  So previous value (R) is only approximation!!!!!!!! ...We really need more accurate version of previous signal that we got in................
  • 7. 2. Third Order Predictive DPCM  To eliminate this noise effect predictive methods are used to predict a more accurate version of the previous signal (use not only the current signal but also varying proportions of a number of the preceding estimated signals)  These proportions used are known as predictor coefficients  Difference signal is computed by subtracting varying proportions of the last three predicted values from the current output by the ADC.  It reduces the bit rate requirements from 64kbps to 32kbps.
  • 8. Third-order predictive DPCM signal encoder and decoder
  • 9. Operation of Third Order Predictive DPCM  R1, R2, R3 will be subtracted from PCM  The values in the R1 register will be transferred to R2 and R2 to R3 and the new predicted value goes into R1  Decoder operates in a similar way by adding the same proportions of the last three computed PCM signals to the received DPCM signal
  • 10. 3. Adaptive differential PCM (ADPCM)  FirstADPCM International standard is defined in ITU-T Recommendation G.721  Savings of bandwidth is possible by varying the number of bits used for difference signal depending on its amplitude (fewer bits to encode smaller difference signals)  Based on the same principle as the DPCM except an eight-order predictor is used and the number of bits used to quantize each difference is varied  This can be either 6 bits – producing 32 kbps – to obtain a better quality output than with third order DPCM, or 5 bits- producing 16 kbps – if lower bandwidth is more important  Second ADPCM International standard is defined in ITU-T Recommendation G.722  Better sound quality at the cost of added complexity.  Input speech BW is extended from 50-7kHz compared with 3.4kHz for a standard PCM system  Wider BW give rise to high quality........as you need in video conferencing..
  • 11.  This uses Subband coding  In this coding input signal prior to sampling is passed through two filters:  One passes only signal frequencies in the range 50Hz - 3.5kHz and  Other only frequencies in the range 3.5kHz - 7kHz  By doing this the input signal is effectively divided into two separate equal-bandwidth signals,  first known as the lower subband signal and,  second the upper subband signal  Each is then sampled and encoded independently using ADPCM.  The use of two subbands has the advantage that different bit rates can be used for each.  The two bitstreams are multiplexed to produce the transmitted signal – in such a way that, decoder in the receiver is able to divide them back again into two separate streams for decoding.  Operating bit rates are 64,56, or 48kbps.
  • 12. ADPCM subband encoder and decoder schematic
  • 13. 4. Adaptive predictive coding  Even higher levels of compression possible at higher levels of complexity  These can be obtained by also making predictor coefficients adaptive  In practice, the optimum set of predictor coefficients continuously vary since they are a function of the characteristics of the audio signal being digitized  The optimum set of coefficients are then computed and these are used to predict more accurately the previous signal  This type of compression can reduce the bandwidth requirements to 8kbps while still obtaining an acceptable perceived quality
  • 14. 5.Linear predictive coding  With this coding Perceptual Features of an audio waveform are analysed by the source first.  These are then quantized and sent and the destination uses them, together with a sound synthesizer, to regenerate a sound that is perceptually comparable with the source audio signal  With this compression technique, although the generated speech can often be sound synthetic, but very high levels of compressions can be achieved.  Now, what are those perceptual features.....need to be analyzed..........?????
  • 15.  In terms of speech, Three features which determine the perception of a signal by the ear are its:  Pitch: This is closely related to the frequency of the signal. This is important since ear is more sensitive to signals in the range 2-5kHz  Period: This is the duration of the signal  Loudness: This is determined by the amount of energy in the signal  In addition, orign of sound is also important. These are called vocal tract excitation parameters  Voiced sound: These are generated through vocal chords. e.g letters m,v, and i.  Unvoiced sound: With these vocal chords are open. e.g letters f and s.
  • 16. Operation of LPC encoder and decoder ENCODER:  The input speech waveform is first sampled and quantized at a defined rate.  A block of digitized samples – known as segment - is then analysed to determine the various perceptual parameters of the speech that it contains.  The output of the encoder is a string of frames, one for each segment  Each frame contains:  fields for pitch and loudness  Period determined by the sampling rate being used  Notification of whether the signal is voiced or unvoiced  Set of computed modal coefficients  Some LPC encoders uses up to 10 set of previous model coefficients to predict the output sound called LPC-10 and uses bit rates as low as 2.4kbps-1.2 kbps.
  • 17. Operation of LPC encoder and decoder cont.. .......... DECODER  Speech signal generated by vocal tract model in the decoder is a function of the:  Present output of speech synthesizer (as determined by the current state of model coefficients).  Plus a linear combination of previous set of model coefficients. APPLICATION  Generated sound at this low rate is very synthetic and so LPC encoders are used primarily in Military Applications where BW is all important.
  • 18. Linear predictive coding (LPC) signal encoder and decoder
  • 20. 6. Code-excited LPC (CELPC)  The synthesiser used in most LPC decoders are based on a very basic model of the vocal tract  These are intended for use with applications in which the amount of bandwidth available is limited but the perceived quality of the speech must be of acceptable standard for use in various multimedia applications  In CELPC model instead of treating each digitized segment independently for encoding purposes, just a limited set of segments are used, each known as a wave template  A pre computed set of templates are held by the encoder and the decoder in what is known as the template codebook  Each of the individual digitized samples that make up a particular template in the codebook are differently encoded
  • 21.  All coders of this type have a delay associated with them which is incurred while each block of digitized samples is analysed by the encoder and the speech is reconstructed at the decoder  The combined delay value is known as the coder’s processing delay  In addition before the speech samples can be analysed it is necessary to buffer the block of samples  The time to accumulate the block of samples is known as the algorithmic delay  The coders delay an important parameter in conventional telephony application, a low-delay coder is required whereas in an interactive application delay of several seconds before the speech starts is acceptable
  • 22. Perceptual Coding (PC)  LPC and CELP are used only for telephony applications and hence compression of speech signal.  PC are designed for compression of general audio such as that associated with a digital television broadcast.  Use a psychoacoustic model (this exploits a number of limitations of human ear).  Using this approach, sampled segments of the source audio waveform are analysed – but only those features that are perceptible to the ear are transmitted.  E.g although the human ear is sensitive to signals in the range 15Hz to 20 kHz, the level of sensitivity to each signal is non-linear; that is the ear is more sensitive to some signals than others.  WHAT IS THAT LIMITATION OF HUMAN EARS..................?????? ................. MASKING.........EFFECT 
  • 23.  Frequency Masking: When multiple signals are present in audio, a strong signal may reduce the level of sensitivity of the ear to other signals which are near to it in frequency.  Temporal masking: When the ear hears a loud sound it takes a short but a finite time before it could hear a quieter sound.  Psychoacoustic Model is used to identify those signals which are influenced by masking and these are then eliminated from the transmitted signal........and hence compression is achieved ...
  • 24. Sensitivity of the ear:  The dynamic range of ear is defined as the loudest sound it can hear to the quietest sound  Sensitivity of the ear varies with the frequency of the signal as shown....in next slide.  The ear is most sensitive to signals in the range 2-5kHz hence the signals in this band are the quietest the ear is sensitive to.  Vertical axis gives all the other signal amplitudes relative to this signal (2-5 kHz).  In the fig. although the Signal A & B have same relative amplitude, signal A would be heard only because it is above the hearing threshold and B is below the hearing threshold.
  • 25. Sensitivity of the ear varies with the frequency as....
  • 26. Frequency Masking When an audio sound consists of multiple frequency signals is present, the sensitivity of the ear changes and varies with the relative amplitude of the signal
  • 27. Conclusions from diagram:  Signal B is larger than signal A. This causes the basic sensitivity curve of the ear to be distorted in the region of signal B  Signal A will no longer be heard as it is within the distortion band. Variation of frequency masking effect with frequency:  Masking effect at various frequencies 1, 4, and 8kHz are shown as:  Width of masking curve (means range of frequencies that are affected) increases with increasing frequency.  The width of each curve at a particular signal level is known as the critical bandwidth for that frequency.  For frequencies greater than 500Hz critical bandwidth increases linearly in multiples of 100Hz.
  • 28. Variation of frequency masking effect with frequency
  • 29. Temporal masking  After the ear hears a loud sound it takes a further short time before it can hear a quieter sound.  This is known as the temporal masking.  After the loud sound ceases it takes a short period of time for the signal amplitude to decay.  During this time, signals whose amplitudes are less than the decay envelope will not be heard and hence need not be transmitted.  In order to exploit this phenomenon, the input audio waveform must be processed over a time period that is comparable with that associated with temporal masking.
  • 30. Temporal masking caused by loud signal
  • 31. Audio Compression – MPEG Audio coder MOTION PICTURE EXPERT GROUP was formed by the ISO to formulate a set of standards relating to a range of Multimedia applications that involves the use of video with sound. The coder associated with Audio Compression form a part of these standards are known as MPEG audio coders
  • 32. Why Do We Need International Standards?  International standardization is conducted to achieve inter-operability .  Only syntax and decoder are specified.  Encoder is not standardized and its optimization is left to the manufacturer.  Standards provide state-of-the-art technology that is developed by a group of experts in the field.  Not only solve current problems, but also anticipate the future application requirements.
  • 33. MPEG AUDIO CODER FORWARD ADAPTIVE BIT ALLOCATION MODE
  • 34. MPEG audio coder  The audio input signal is first sampled and quantized using PCM.  The bandwidth available for transmission is divided into a number of frequency subbands using a bank of analysis filters.  Analysis filter bank:  Maps each set of 32 (time related) PCM samples into an equivalent set of 32 frequency samples.  Determines the peak amplitude in each subband (consisting of 12 freq. components) called scaling factor.  Processing associated with both frequency and temporal masking is carried out by the psychoacoustic model.  In basic encoder the time duration of each sampled segment of the audio input signal is equal to the time to accumulate 12 successive sets of 32 PCM.  12 sets of 32 PCM time samples are converted into frequency components using DFT.
  • 35.  The output of the psychoacoustic model is a set of what are known as signal-to-mask ratios (SMRs) and indicate the frequency components whose amplitude is below the audible threshold.  This is done to have more bits for highest sensitivity regions compared with less sensitive regions.  In an encoder all the frequency components are carried in a frame.
  • 36. Frame Format:  HEADER: contains information such as the sampling frequency that has been used  SBS:The peak amplitude level in each subband is first quantized using 6 bits and a further 4 bits are then used to quantize the 12 frequency components in the subband relative to this level. Collectively this is called Subband Sample format.  Ancillary data field: at the end of the frame optional. for example: used to carry additional coded samples associated with the surround-sound that is present with some digital video broadcasts.
  • 37.  At the decoder section the de-quantizers will determine the magnitude of each signal  The synthesis filters will produce the PCM samples at the decoders Various Parameters associated with Encoder  Sampling rate used : 32ksps  Max. Signal freq. Component: 16khz so each subband has BW=500Hz.  12 successive set of 32 PCM are used having: Time duration = (12X32)=384 PCM samples
  • 38. Summary of MPEG layer 1,2 and 3 Perceptual Encoders Layer Application Compressed bit rate 1 Digital Audio cassette 32-448kbps Digital Audio and Video 2 broadcasting 32-192kbps CD-quality audio over low bit rate 3 64kbps channels
  • 40. What is VIDEO ?  VIDEO is simply a sequence of digitized pictures, video is also referred to as moving pictures and the terms “frames” and “picture” are used interchangeably.  APPLICATION:  Interpersonal: Video Telephony & Video Conferencing  Interactive: access to stored video in various forms  Entertainment: Digital TV & MOD/VOD  Problem with uncompressed Video:  Raw video contains an immense amount of data  Communication and storage capabilities are limited and expensive.
  • 41. Definitions related to VIDEO:  Bit-rate  Information stored/transmitted per unit time  Usually measured in Mbps (Megabits per second)  Resolution  Number of pixels per frame  Ranges from 160x120 to 1920x1080  FPS (frames per second)  Usually 24, 25, 30, 50 or 60  Don’t need more because of limitations of the human eye
  • 42. Video Compression: Why?  Bandwidth Reduction…………………. Application Data Rate Uncompressed Compressed Video Conference 352 X 240 30.4 Mbps 64 - 768 kbps CD-ROM Digital Video 352 X 240 60.8 Mbps 1.5 - 4 Mbps Broadcast Video 720 X 480 248.8 Mbps 3 - 8 Mbps HDTV 1280 X 720 1.33 Gbps 20 Mbps
  • 43. Video Compression Standards: STANDARD APPLICATION BIT RATE JPEG Continuous-tone still-image Variable compression H.261 Video telephony and p x 64 kb/s teleconferencing over ISDN MPEG-1 Video on digital storage media 1.5 Mb/s (CD-ROM) MPEG-2 Digital Television > 2 Mb/s H.263 Video telephony over PSTN < 33.6 kb/s MPEG-4 Object-based coding, synthetic Variable content, interactivity H.264 From Low bitrate coding to HD Variable encoding, HD-DVD, Surveillance, Video conferencing.
  • 45. Spatial Redundancy  Take advantage of similarity among most neighboring pixels  Occur inside frame
  • 46. Temporal Redundancy  Take advantage of similarity between successive frames  Is measured in between the frames: measure ME & MC 950 951 952
  • 47. Motion Estimation (ME): To measure movement between successive frames. Motion Compensation (MC): This is the additional information that must be sent to indicate any small differences between the predicted and actual positions of the moving segments involved
  • 49. Intracoded (I-Frames)  I-frames (Intracoded frames) are encoded without reference to any other frames.  Each frame is treated as a separate picture and the Y, Cr and Cb matrices are encoded separately using JPEG.................in next slide........  I–frames the compression level is small  They are good for the first frame relating to a new scene in a movie  I-frames must be repeated at regular intervals to avoid losing the whole picture as during transmission it can get corrupted and hence looses the frame  The number of frames/pictures between successive I-frames is known as a group of pictures (GOP). Typical values of GOP are N=3 - 12
  • 51. Encoding of I-Frame:  RGB to YUV  less information required for YUV (humans less sensitive to chrominance)  Macro Blocks  Take groups of pixels (16x16)  Discrete Cosine Transformation (DCT)  Based on Fourier analysis where represent signal as sum of sine's and cosine’s  Concentrates on higher-frequency values  Represent pixels in blocks with fewer numbers  Quantization  Reduce data required for co-efficients  Entropy coding  Compress
  • 52. Encoding of I-Frame cont…. Zig-Zag Scan, Quantization Run-length • major reduction coding • controls ‘quality’
  • 53. Predictive Frame (P-frame)  The encoding of the P-frame is relative to the contents of either a preceding I-frame or a preceding P-frame  P-frames are encoded using a combination of motion estimation and motion compensation  The accuracy of the prediction operation is determined by how well any movement between successive frames is estimated. This is known as the motion estimation  Since the estimation is not exact, additional information must also be sent to indicate any small differences between the predicted and actual positions of the moving segments involved. This is known as the motion compensation  No of P frames between I-frames is limited to avoid error propagation (since any error present in the first P-frame will be propagated to the next)  No. Of frames between a P-Frame and immediately preceding I-or-P Frame is called prediction span(M)
  • 54. Frame Sequences I-, P- and B-frames
  • 55. Bi-directional Frame (B-frame)  For fast moving video e.g movies, B-frames (Bi-directional) are used. Their contents are predicted using the past and the future frames.  B-frame is encoded relative to the preceding as well as the succeeding I & P frame.  B-frame results in encoding delay because time needed to wait for the next I or P frame in the sequence.  B- frames provides highest level of compression and because they are not involved in the coding of other frames they do not propagate errors.
  • 56. PB-Frames PB-frame: It does not refer to a new frame type as such but rather the way two neighbouring P- and B-frames are encoded as if they were a single frame
  • 57. D-frame  This is application specific used in MOD/VOD applications.  In these application user wish for fast forward or rewind through the movie, this requires the compressed video to be decompressed at a much higher rate. To support this function encoded bit stream also contains D-frame.
  • 58. Motion Estimation & Motion Compensation (Encoding of P & B frame)  Motion estimation involves comparing small segments of two consecutive frames for differences, and as difference is detected a search is carried out to determine which neighbouring segments the original segment has moved.  To limit the time for search the comparison is limited to few segments  P-Frame: We will estimate the motion that has taken place between the frame being encoded and preceding I or P frame (in case of P frame)  B-Frame: We will estimate the motion that has taken place between the frame being encoded and preceding I or P frame as well as succeeding I or P frame. (in case of B frame).
  • 59. P-frame encoding The digitized contents of the Y matrix associated with each frame are first divided into a two-dimensional matrix of 16 X 16 pixels known as a MACROBLOCK
  • 60.  MB consists of :  4 DCT blocks (8X8) for the luminance signals  1 DCT block each for the two chrominance signals (Cb and Cr).  Each MB has an address associated with it.  To encode a p-frame the contents of each macroblock in the frame – known as the target frame are compared on a pixel-by-pixel basis with the contents of the preceding I or P frames (reference frames) I or P P Reference Frame Target Frame  SEARCH........SEARCH.........SEARCH..............O/P may be...:-  If a close match is found then only the address of the MB is encoded  If a match is not found the search is extended to cover an area around the MB in the reference frame.
  • 61.  All the possible MB in the selected search area in reference frame are searched for a match………………………..  Case 1:if a close match is found then two parameters are encoded:  Motion Vector(V): It indicates the (x,y) offset of the MB encoded. It is further encoded by differential encoding  Prediction Error: It consists of three matrices (one each for Y, Cb, Cr) each of which contains the difference values between those in Target MB and set of pixels in the search area in the Reference frame that produced the closed match. This is encoded by same method as used for I frame  Case 2: If a match is not found e.g if the moving object is moved out of the extended search area  MB is encoded independently in the same way as MBs in the I frame.
  • 62. Match is said to be found if the mean of absolute errors in all the pixel positions in the difference Difference MB (MD) is less than a given threshold.
  • 64. B-frame encoding  To encode a B-frame, any motion is estimated with reference to both the immediately preceding I- or P-frame and the immediately succeeding P- or I-frame.  The parameters motion vector and prediction error (difference matrices) which are computed using:  first the preceding frame as reference and  then succeeding frame as reference.  A third motion vector and set of difference matrices are then computed using the target and the mean of the other two predicted set of values (MD and MD’).
  • 66. Decoding of I, P, and B frames:  I-frames : decode immediately to recreate original frame  P-frames: The received information is decoded and the resulting information is used with the decoded contents of the preceding I/P frames (two buffers are used)  B-frames: The received information is decoded and the resulting information is used with the decoded contents of the preceding and succeeding P or I frame (three buffers are used)
  • 67. Implementation schematic – I-frames  The encoding procedure used for the macroblocks that make up an I- frame is the same as that used in the JPEG standard to encode each 8 x 8 block of pixels.  Implementation Issues:  I-frame same as JPEG implementation  FDCT, Quantization, entropy encoding  Assuming 4 blocks for the luminance and 2 blocks for the chrominance, each macroblock (MB) would require six 8x8 pixel blocks to be encoded
  • 68. Implementation Schematic- P-frames  In the case of P-frames, encoding of each macroblock is dependent on output of the motion estimation (ME) unit which, in turn, depends on the contents of the MB (target frame) being encoded and the contents of the macroblock in the search area (reference frame) that produces the closest match. There are three possibilities:  If the two contents are the same, only the address of the macroblock in the reference frame is encoded  If the two contents are very close, both the motion vector and the difference matrices associated with the macroblock in the reference frame are encoded  If no close match is found, then the target macroblock is encoded in the same way as a macroblock in an I-frame
  • 69. In order to carry out its role, the motion estimation unit containing the search logic, utilizes a copy of the (uncoded) reference frame
  • 70. Implementation schematic – B-frames The same previous procedure is followed for encoding B- frames except both the preceding (reference) and the succeeding frame to the target frame are involved
  • 71. Macroblock encoded bit-stream format– For each macroblock it is necessary to identify the type of encoding that has been used. This is the role of the formatter.  Type – indicates the type of frame encoded I, P or B  Address – identifies the location of the macroblock in the frame  Quantization Value – is the value used to quantize all the DCT coefficients in the macroblock  Motion vector – encoded vector  Block representation – indicates which of the six 8X8 blocks that make up the macroblcok are present  B1, B2, ..B6: JPEG encoded DCT coefficients for those blocks present
  • 72. MPEG (Moving Pictures Expert Group)  Committee of experts that develops video encoding standards in the year 1990.  Until recently, was the only game in town (still the most popular, by far)  Suitable for wide range of videos  Low resolution to high resolution  Slow movement to fast action  Can be implemented either in software or hardware
  • 73. MPEG:  MPEG-1 ISO Recommendation 11172  Source intermediate digitization format (SIF) is used.  Uses resolution of 352x288 pixels and used for VHS quality audio and video on CD-ROM at a bit rate of 1.5 Mbps  MPEG-2 ISO Recommendation 13818  Used in recording and transmission of studio quality audio and video.  Different levels of video resolution possible Low: 352X288 comparable with MPEG-1 Main: 720X 576 pixels studio quality video and audio, bit rate up to 15 Mbps High: 1920X1152 pixels used in wide screen HDTV bit rate of up to 80Mbps are possible
  • 74. MPEG-4: Used for interactive multimedia applications over the Internet and over various entertainment networks  MPEG standard contains features to enable a user not only to passively access a video sequence using for example the start/stop/ but also enables the manipulation of the individual elements that make up a scene within a video  In MPEG-4 each video frame is segmented into a number of video object planes (VOP) each of which will correspond to an AVO (Audio visual object) of interest.
  • 75. MPEG-1 • Uses a similar video compression technique as H.261; the digitization format used is the source intermediate format (SIF) and progressive scanning with a refresh rate of 0 Hz (NTSC) and 25 Hz (for PAL)
  • 76. Performance  Compression for I-frames are similar to JPEG for Video typically 10:1 through to 20:1 depending on the complexity of the frame contents  P and B frames are higher compression and in the region of 20:1 through to 30:1 for P frame and 30:1 to 50:1 for B-frames
  • 77. Video Compression – MPEG-1 video bitstream structure: composition • The compressed bitstream produced by the video encoder is hierarchical: at the top level, the complete compressed video (sequence) which consists of a string of groups of pictures
  • 78. Video Compression – MPEG-1 video bitstream structure: format • In order for the decoder to decompress the received bitstream, each data structure must be clearly identified within the bitstream
  • 79. Video Compression – MPEG-4 coding principles • Content based video coding principles showing how a frame/ scene is defined in the form of multiple video object planes
  • 80. Video Compression – MPEG – 4 encoder/decoder schematic • Before being compressed each scene is defined in the form of a background and one or more foreground audio-visual objects (AVOs)
  • 81. Video Compression – MPEG VOP encoder The audio associated with an AVO is compressed using one of the algorithms described before and depends on the available bit rate of the transmission channel and the sound quality required