1. H.264 Library
Detailed Reference for Verification and Design Exploration
H.264 Overview I Previous video coding standards use an 8x8 in raster scan order. Two slice types are sup-
Broadcast television and home entertainment real discrete cosine transform (DCT) to exploit ported in H.264 baseline profile. In an I-slice,
have been revolutionized by the advent of the spatial redundancy in the 8x8 block of all macro blocks are encoded in intra mode.
digital TV and DVD-video. These applications image data. In H.264/AVC, a smaller 4x4 In a P-slice, some macro blocks are predicted
and many more were made possible by the integer DCT is used which significantly using a motion compensated prediction with
standardization of video compression technol- reduces ringing artifacts associated with one reference frame among the set of reference
ogy. The recent standard in the MPEG (Moving the transform. frames and some macro blocks are encoded in
Pictures Expert Group) series, MPEG-4, is I Various block sizes from 16x16 to 4x4 are intra mode. H.264 decoder processes the data
enabling a new generation of internet-based allowed to perform motion compensation on a macro block by macro block basis. For
video applications whilst the ITU-T H.263 prediction. every macro block depending on its charac-
standard for video compression is now widely teristics, it will be constructed by the predicted
I Previous video coding standards used a
used in video conferencing systems. part of the macro block and the residual
maximum of half-pixel accuracy for motion
(error) part, which is coded using CAVLC.
H.264/AVC (aka MPEG-4 Part 10), the result estimation.
of the collaboration between the ISO/IEC I Inter prediction mode allows multiple Figure 1 shows the overall block diagram of
MPEG group and the ITU-T Video Coding reference frames for block-based motion an H.264 baseline profile video decoder.
Experts Group (VCEG), is the latest video compensation prediction. H.264 bit-stream passes through the “slice
coding standard. The goals of this standard- I Context-adaptive variable length coding header parsing” block. This block extracts
ization effort were: enhanced compression (CAVLC) and context-adaptive binary the information about each slice. In H.264
efficiency; network friendly video representa- arithmetic coding (CABAC) are used for video coding, each macro block is categorized
tion for interactive applications (video tele- entropy encoding/decoding which improves as either coded or skipped. If the macro block
phony) and non-interactive applications compression by 10% compared to previous is skipped, then the macro block is completely
(broadcast applications, storage media appli- schemes. reconstructed using the inter prediction
cations, etc). H.264/AVC provides gains in module. In this case, the residual information
compression efficiency up to 50% over a wide H.264 Baseline Profile Video Decoder is zero. If the macro block is coded, then based
range of bit rates and video resolutions com- Compressed H.264 bit-stream data is avail- on the prediction mode it passes through the
pared to previous standards. Compared to able on slice-by-slice basis whereas a slice is “Intra 4x4 prediction” block or “Intra 16x16
previous standards, the decoder complexity usually a group of macro blocks processed prediction” block or “Inter prediction” block.
is about four times that of MPEG-2 and two
times that of MPEG-4 visual simple profile.
Skip MB
Relative to prior video coding standards,
H.264
H.264/AVC introduces the following changes Elementary
Bit Stream Inter
Slice Macro Sub MB Prediction
Header Block Parsing
I In order to reduce the blocking artifacts, an Parsing Parsing Prediction Path
Deblocking
adaptive loop filter is used in the prediction Intra 4x4
Prediction
Filter
loop to reduce blocking artifacts.
I A prediction scheme called intra prediction Intra 16x16
Prediction
is used that exploits spatial redundancy. In
this scheme, data from previously processed
CAVLC Scale &
macroblocks is used to predict the data for Decoding Inverse
Transform
the current macroblock in the current Residual Path
encoding frame.
Figure 1: Block diagram of H.264 baseline profile video decoder.
2. H.264 Library 2
The output macro block is reconstructed
using the prediction output from the predic-
tion module and the residual output from the
“scale and transform” module. Once all the
macro blocks in a frame are reconstructed,
de-blocking filter will be applied for the
entire frame.
The “macro block parsing module” parses
the information related to the macro block,
such as prediction type, number of blocks
coded in a macro block, partition type,
motion vectors, etc. The “sub macro block”
parsing module parses the information if the
macro block is split into sub macro blocks of
one of the sizes 8x8, 8x4, 4x8, and 4x4 when
the macro block is coded as inter macro
block. If the macro block is not split into sub
macro blocks, all the three prediction types Figure 2.: CoWare Signal Processing Designer View of H.264 baseline decoder operation
(Intra16x16, Intra4x4, or Inter) can be used.
In inter prediction module, the motion
compensated predicted blocks are predicted H.264/AVC baseline profile video decoder is is a 2x2 inverse Hadamard transform, which
from the previous frames, which are already using CAVLC entropy coding method to is used to form the DC coefficients of the
decoded. decode the encoded quantized residual chrominance blocks.
transform coefficients. In CAVLC module,
Intra prediction means that the samples of a the number of non-zero quantized transform The 4x4 block transform and motion com-
macro block are predicted by using the already coefficients, the actual size and the position pensation prediction can be the source of
transmitted macro blocks of the same image. of each coefficient are decoded separately. blocking artifacts in the decoded image.
In H.264/AVC, two different types of intra The tables used for decoding these parameters Filtering the block edges will improve the
prediction modes are available for coding are adaptively changed depending on the final visual quality of decoded image. H.264
luminance component of the macro block. previously decoded syntax elements. After standard is using in-loop deblocking filter to
The first type is called INTRA_4x4 mode and decoding, the coefficients are inverse zigzag remove the blocking artifacts.
the second type is called INTRA_16x16 scanned and form a 4x4 blocks which are
mode. In INTRA_4x4 prediction mode, each given to “scale and inverse transform module”. CoWare Signal Processing Designer Library of
macro block of size 16x16 is divided into H.264 Baseline Profile Video Decoder
small blocks of size 4x4 and prediction is In “scale and inverse transform module”, The Signal Processing Designer library of
carried out individually for each sub-block inverse quantization and inverse transforma- H.264 baseline profile video decoder system,
using one of the nine prediction modes avail- tion are done on the decoded coefficients developed by iDeaWorks, is shown in Figure 2.
able. In INTRA_16x16 prediction mode, the and form a residual data suitable for inverse
prediction. Three different types of transforms The decoder takes H.264 baseline profile
prediction is carried out at macro block level
are used in H.264 standard. The first type is elementary bit stream from a source as NAL
using one of the four prediction modes
4x4 inverse integer discrete cosine transform units and decodes the compressed video
available. Intra prediction for chrominance
(DCT), which is used to form the residual streams and displays and dumps the decoded
components of a macro blocks is similar
blocks of both luminance and chrominance frames. Frame width, frame height, and NAL
to the INTRA_16x16 prediction of the lumi-
blocks. Second type is a 4x4 inverse Hadamard unit size can be set using global parameters.
nance component.
transform, which is used to form the DC There are nine major blocks in the top level
coefficients of the 16 luminance blocks of the hierarchy of H.264 video decoder, which are:
INTRA_16x16 macro blocks. Third transform
3. 3
Source Block Frame Construct Block The algorithm model meets these industry
This block converts H.264 baseline profile This is a multirate block, in which the input standards.
elementary bit stream into fixed sized frame is coming in as MBs and the output is going I ITU-T H.264 standard OR
NAL units. Each frame NAL unit contains all out as frame. This block takes vectors of size
I ISO/IEC MPEG-4 Part 10 standard
NAL units of a single frame. The block takes 532 bytes and forms a single frame. There
H.264 bit stream file name and frame NAL are three modules in this block, which are Included with the library are reference test
unit size as parameters. It gives frame NAL benches consisting of standard media video
I Polymorphic vector-to-scalars
unit as a vector of size specified by the input feeds capable of exercising the algorithm and
conversion block
parameter and end of sequence (EOS) flag any implementation of the reference library.
I Polymorphic scalars-to-vector The decoder library is qualified with the JVT
as outputs.
conversion block standards compliance video streams.
Parser Block I Pur filter data block
This module is used to parse the following The following packages are available:
Display Reorder Block
set of information This block is used to reorder the decoded I CoWare Signal Processing Designer H.264
I Control data required for doing intra and YUV display data and is also used to crop the Reference Library
inter prediction output YUV data according to cropping
–H.264 models as partitioned verification
I Motion compensation data required for parameters. This block takes decoded frame
reference
quarter pixel interpolation data, display re-order control data, and end
of sequence flag + cropping parameters as –Reference media streams
I Quantized residual transform coefficients
inputs and gives display frame data and a –Value: Validated, partitioned executable
I Control parameters for performing scale flag indicating whether the display frame reference inside CoWare flow for either
and transform data is valid or not. customer H.264 algorithm optimization
I Control data required for applying the I CoWare Signal Processing Designer H.264
deblocking filter on the decoded image Sink Block
Design Library
This block is used to dump the decoded
I Control data required for display re-ordering –Same as reference library, but down to very
YUV data into an output file. This block
The parser works at frame level. takes display frame data, flag that indicates low level of granularity
the validity of frame data, control data for –Reference module test benches
Frame to Macroblock Conversion Block
cropping, and end of sequence flag as inputs. –Value: Verification and detailed reference
This is a multirate block, which converts
The output file name can be set using para- for the design and optimization of sub
frame level parameters to macroblock level
meters of sink block. functions
parameters. The parser block drives this
block. The output rate is ((Frame Width x –Documentation: Detailed Algorithm docu-
Frame Height)/256) times the input rate. H.264 Library Use Models mentation representing, Control, dataflow
The H.264 Baseline Video Decoder was devel- and Memory requirements.
Macroblock Process Block oped as a optimized, architecture neutral ref-
–Test Vectors: the scaling transform, predic-
The macroblock process block consists of erence standard for users incorporating
tion, and de-blocking filter modules have a
four major modules, which are advanced video decoding in embedded or
full ten frames of test vectors for QCIF, CIF,
I Scale and transform module stand-alone products. It consists of a fully
and HD 720p resolutions.
decomposed software C-based algorithm
I Prediction module I CoWare Signal Processing Designer H.264
encapsulated within CoWare Signal
I Deblocking module Source Library
Processing Designer. This environment
I Intra feedback data module insures users can use this product as a refer- –Requires H.264 Design Library
This block operates at macroblock level. ence tool for FPGA, ASIC, SoC, or embedded –Source code license agreement
IC implementations. The algorithms have –Specific jump-start service package included
been decomposed down to the ‘leaf ‘or lowest
–All leaf level blocks as source code
common functions. Parallelism and concur-
rency is fully exposed in the CoWare Signal –Value: working C-code for embedded imple-
Processing Designer H.264 Design Library mentation or starting point for detailed
and CoWare Signal Processing Designer RTL design
H.264 Source Library.