1. EE 5359 PROPOSAL
H.264 to VC-1 TRANSCODING
Vidhya Vijayakumar
Student I.D.: 1000-622152
Date: September 24, 2009
1
2. H.264 to VC-1 TRANSCODER
OBJECTIVE:
The objective of the thesis is to implement a H.264 bitstream to VC-1
transcoder for progressive compression.
MOTIVATION:
The high definition video adoption has been growing rapidly for the last five
years. The high definition DVD format Blue ray has mandated MPEG-2[3], H.264 [2]
and VC-1 [1] as video compression formats. The coexistence of these different video
coding standards creates a need for transcoding. As more and more end products use
the above standards, transcoding from one format to another adds value to the
product’s capability. While there has been recent work on MPEG-2 to H.264
transcoding [3], VC-1 to H.264 transcoding [4], the published work on H.264 to VC-1
transcoding is nearly non-existent. This has created the motivation to develop a
transcoder that can efficiently transcode a H.264 bitstream to a VC-1 bitstream.
DETAILS:
Video transcoding is the operation of converting video from one format to
another [5]. A format is defined by characteristics such as bit-rate, spatial resolution
etc. One of the earliest applications of transcoding is to adapt the bit-rate of a
compressed stream to the channel bandwidth for universal multimedia access in all
kinds of channels like wireless networks, Internet, dial-up networks etc. Changes in
the characteristics of an encoded stream like bit-rate, spatial resolution, quality etc can
also be achieved by scalable video coding [5].However, in cases where the available
network bandwidth is insufficient or if it fluctuates with time, it may be difficult to set
the base layer bit-rate. In addition, scalable video coding demands additional
complexities at both the encoder and the decoder.
The basic architecture for converting an H.264 bitstream into a VC-1
elementary stream arises from complete decoding of the H.264 stream and then re-
encoding into a VC-1 stream. However, this involves significant computational
complexity [6]. Hence there also is a need to transcode at low complexity.
Transcoding can in general be implemented in the spatial domain or in the
transform domain or in a combination of the two domains. The common transcoding
architectures [5] are:
Open loop transform domain transcoding
Fig. 1 Open loop transform domain transcoder architecture [5]
2
3. Open loop transcoders are computationally efficient (Fig 1). They operate in the DCT
domain. However they are subject to drift error. Drift error occurs due to rounding,
quantization loss and clipping functions.
Cascaded Pixel Domain Architecture (CPDT)
Fig. 2 Cascaded pixel domain transcoder architecture [5]
This is the most basic transcoding architecture (Fig 2). The motion vectors from the
incoming bit stream are extracted and reused. Thus the complexity of the motion
estimation block is eliminated which accounts for 60% of the encoder computation.
As compared to the previous architecture, CPDT is drift free. Hence, even though it is
slightly more complex, it is suited for heterogeneous transcoding between different
standards where the basic parameters like mode decisions, motion vectors etc are to
be re-derived.
Simplified DCT Domain transcoders (SDDT)
Fig. 3 Simplified transform domain transcoder architecture [5]
This transcoder is based on the assumption that DCT, IDCT and motion compensation
are linear processes (Fig 3). This architecture requires that motion compensation be
performed in the DCT domain, which is a major computationally intensive operation
[3]. For instance, as shown in the figure 4, the goal is trying to compute the DCT
3
4. coefficients of the target block B from the four overlapping blocks B1, B2, B3 and
B4.
Fig. 4 Transform domain motion compensation illustration [5]
Also, clipping functions and rounding operations performed for interpolation
in fractional pixel motion compensation lead to a drift in the transcoded video.
Cascaded DCT Domain transcoders (CDDT)
Fig. 5 Cascaded transform domain transcoder architecture [5]
This is used for spatial/temporal resolution downscaling and other coding parameter
changes (Fig 5). As compared with SDDT, greater flexibility is achieved by
introducing another transform domain motion compensation block; however it is far
more computationally intensive and requires more memory [3]. It is often applied to
downscaling applications where the encoder end memory will not cost much due to
downscaled resolution.
4
5. Choice of basic transcoder architecture:
DCT domain transcoders have the main drawback that motion compensation
in transform domain is very computationally intensive. DCT domain transcoders are
also, less flexible as compared to pixel domain transcoders, for instance, the SDDT
architecture can only be used for bit rate reduction transcoding. It assumes that the
spatial and temporal resolutions stay the same and that the output video uses the same
frame types, mode decisions and motion vectors as the input video.
For H.264 to VC-1 transcoding, it is required to implement several changes in
order to accommodate the mismatches between the two standards. For instance, for
motion estimation and compensation, H.264 supports 16x16, 16x8, 8x16, 8x8, 8x4,
4x8, 4x4 macroblock partitions (Fig 6), but VC-1 supports 16x16 and 8x8 only (Fig
7). The transform size and type (8x8 and 4x4 in H.264 and 8x8, 4x8, 8x4 and 4x4 in
VC-1) are different and make transform domain transcoding prohibitively complex.
Hence, the use of DCT domain transcoders is not very ideal.
Fig.6 Segmentations of the macroblock for motion compensation in H.264
Top: segmentation of macroblocks, bottom: segmentation of 8x8 partitions. [2]
Fig.7 Segmentations of the macroblock for motion compensation in VC-1 [2]
From Fig. 8, it can be inferred that, the cascaded pixel domain architecture
outperforms the DCT domain transcoders. Also for larger GOP sizes, the drift in DCT
domain transcoders becomes more significant.
5
6. Fig.8 PSNR vs Bit-rate graph for the Foreman sequence transcoded with a GOP size 15, using
different transcoding architectures as described in Figs. 1, 2, 3 and 5. [5]
Hence, heterogeneous transcoding in the pixel domain is preferred for
standards transcoding.
Standards transcoding:
When transcoding between two different standards, the main factor involved is
compatibility between the profile and level of the input stream and that of the output
stream for a specific purpose. The goal here is to transcode a H.264 bitstream of
Baseline profile to VC-1 bit stream of Simple profile.
The table 1 compares and contrasts the characteristics of both standards
H.264 High Profile VC-1 Main Profile
Chroma Format 4:2:0 4:2:0
Picture coding type I ,P ,B I ,P ,B
Transform size 4x4, 8x8 8x8, 4x8, 8x4, 4x4
Intra Prediction Directional Predictors None
Block sizes for Motion 16x16, 16x8, 8x16, 8x8, 4x8, 16x16, 8x8
Compensation 8x4, 4x4
Table 1 Main characteristics of H.264 Main profile and VC-1 Main profile
Overview of H.264:
H.264 [2] is a standard for video compression, and is equivalent to
MPEG-4 Part 10, or MPEG-4 AVC (for advanced video coding) (Fig 9). As of 2008,
it is the latest block-oriented motion-compensation-based video standard developed
by the ITU-T Video Coding Experts Group (VCEG) together with the ISO/IEC
6
7. Moving Picture Experts Group (MPEG), and it was the product of a partnership effort
known as the Joint Video Team (JVT). The ITU-T H.264 standard and the ISO/IEC
MPEG-4 Part 10 standard (formally, ISO/IEC 14496-10) are jointly maintained so
that they have identical technical content.
Fig 9 H.264 Encoder [32]
Fig 10. H.264 Decoder [32]
The standardization of the first version of H.264/AVC was completed in May
2003. The JVT then developed extensions to the original standard that are known as
the Fidelity Range Extensions (FRExt) [29]. These extensions enable higher quality
video coding by supporting increased sample bit depth precision and higher-resolution
color information, including sampling structures known as YUV 4:2:2 and YUV
4:4:4. Several other features are also included in the Fidelity Range Extensions
project, such as adaptive switching between 4×4 and 8×8 integer transforms, encoder-
specified perceptual-based quantization weighting matrices, efficient inter-picture
lossless coding, and support of additional color spaces. The design work on the
Fidelity Range Extensions was completed in July 2004, and the drafting work on them
was completed in September 2004.
Scalable video coding (SVC) [30] as specified in Annex G of H.264/AVC
allows the construction of bitstreams that contain sub-bitstreams that conform to
H.264/AVC. For temporal bitstream scalability, i.e., the presence of a sub-bitstream
with a smaller temporal sampling rate than the bitstream, complete access units are
removed from the bitstream when deriving the sub-bitstream. In this case, high-level
syntax and inter prediction reference pictures in the bitstream are constructed
accordingly. For spatial and quality bitstream scalabilities, i.e. the presence of a sub-
bitstream with lower spatial resolution or quality than the bitstream, network
7
8. abstraction layer (NAL) units are removed from the bitstream when deriving the sub-
bitstream. In this case, inter-layer prediction, i.e., the prediction of the higher spatial
resolution or quality signal by data of the lower spatial resolution or quality signal, is
typically used for efficient coding. The Scalable Video Coding extension was
completed in November 2007.
Some of the features adopted in H.264 for enhancement of prediction, improved
coding efficiency and robustness to data errors/losses are listed as follows.
Features for enhancement of prediction
• Directional spatial prediction for intra coding
• Variable block-size motion compensation with small block size
Figure 11 – Various block sizes in H.264
• Quarter-sample-accurate motion compensation
• Motion vectors over picture boundaries
• Multiple reference picture motion compensation
• Decoupling of referencing order from display order
• Decoupling of picture representation methods from picture referencing
capability
• Weighted prediction
• Improved “skipped” and “direct” motion inference
• In-the-loop deblocking filtering
Features for improved coding efficiency
• Small block-size transform
• Exact-match inverse transform
8
9. Figure – Forward 4x4 and 8x8 integer transform
• Short word-length transform
• Hierarchical block transform
• Arithmetic entropy coding
• Context-adaptive entropy coding
Features for robustness to data errors/losses
• Parameter set structure
• NAL unit syntax structure
• Flexible slice size
• Flexible macroblock ordering (FMO)
• Arbitrary slice ordering (ASO)
• Redundant pictures
• Data partitioning
• SP/SI synchronization/switching pictures
Profiles in H.264
H.264 standard defines numerous profiles.
• Constrained baseline profile
• Baseline
• Main profile
• Extended profile
• High profile
9
10. • High 10 profile
• High 4:2:2 profile
• High 4:4:4 predictive profile
• High stereo profile
• High 10 intra profile
• High 4:2:2 intra profile
• High 4:4:4 intra profile
• CAVLC 4:4:4 intra profile
• Scalable baseline profile
• Scalable high profile
• Scalable high intra profile
Table Features in baseline, main and extended profile
Table Features in high profile
10
11. High Profiles
Adaptive transform block size
Extended Profile Quantization scaling matrices
Main Profile
CABAC
Data partition
B slice
SI slice Weighted prediction
SP slice
I slice
P slice
CAVLC
Arbitrary slice order
Flexible macroblock order
Redundant slice
Baseline Profile
Figure 12 Comparison of H.264 baseline, main, extended and high profile
Overview of VC-1
VC-1 [1] is the informal name of the SMPTE 421M video codec standard
initially developed by Microsoft. It was released on April 3, 2006 by SMPTE. It is
now a supported standard for Blu-ray Discs, and Windows Media Video 9.
VC-1 is an evolution of the conventional DCT-based video codec design also
found in H.261 [31], H.263 [27], MPEG-1[40] and MPEG-2[3]. It is widely
characterized as an alternative to the latest ITU-T and MPEG video codec standard
known as H.264/MPEG-4 AVC. VC-1 contains coding tools for interlaced video
sequences as well as progressive encoding. The main goal of VC-1 development and
standardization is to support the compression of interlaced content without first
converting it to progressive, making it more attractive to broadcast and video industry
professionals.
The VC-1 codec is designed to achieve state-of-the-art compressed video
quality at bit rates that may range from very low to very high. The codec can easily
handle 1920 pixel × 1080 pixel resolution at 6 to 30 megabits per second (Mbps) for
high-definition video. VC-1 is capable of higher resolutions such as 2048 pixels ×
1536 pixels for digital cinema, and of a maximum bit rate of 135 Mbps. An example
of very low bit rate video would be 160 pixel × 120 pixel resolution at 10 kilobits per
second (Kbps) for modem applications.
11
12. The basic functionality of VC-1 involves a block-based motion compensation
and spatial transform scheme similar to that used in other video compression
standards such as MPEG-1 and H.261 [31]. However, VC-1 includes a number of
innovations and optimizations that make it distinct from the basic compression
scheme, resulting in excellent quality and efficiency. VC-1 Advanced Profile is also
transport independent. This provides even greater flexibility for device manufacturers
and content services.
Fig. 11 VC – 1 Codec [32]
Profiles in VC-1
VC-1 defines three profiles
1. Simple
2. Main
3. Advanced
Simple Main Advanced
Baseline intra frame
Yes Yes Yes
compression
Variable-sized transform Yes Yes Yes
16-bit transform Yes Yes Yes
Overlapped transform Yes Yes Yes
4 motion vector per
Yes Yes Yes
macroblock
12
13. ¼ pixel luminance motion
Yes Yes Yes
compensation
¼ pixel chrominance motion
No Yes Yes
compensation
Start codes No Yes Yes
Extended motion vectors No Yes Yes
Simple Main Advanced
Loop filter No Yes Yes
Dynamic resolution change No Yes Yes
Adaptive macroblock
No Yes Yes
quantisation
B frames No Yes Yes
Intensity compensation No Yes Yes
Range adjustment No Yes Yes
Field and frame coding modes No No Yes
GOP Layer No No Yes
Display metadata No No Yes
Table – Features in VC-1 profiles [49]
Innovations
13
14. VC-1 includes a number of innovations that enable it to produce high quality
content. This section provides brief descriptions of some of these features.
Adaptive Block Size Transform
Traditionally, 8 × 8 transforms have been used for image and video coding.
However, there is evidence to suggest that 4 × 4 transforms can reduce ringing
artifacts at edges and discontinuities. VC-1 is capable of coding an 8 × 8 block using
either an 8 × 8 transform, two 8 × 4 transforms, two 4 × 8 transforms, or four 4 × 4
transforms. This feature enables coding that takes advantage of the different transform
sizes as needed for optimal image quality.
Figure – VC-1 transform sizes [4]
16-Bit Transforms
In order to minimize the computational complexity of the decoder, VC-1 uses
16-bit transforms. This also has the advantage of easy implementation on the large
amount of digital signal processing (DSP) hardware built with 16-bit processors.
Among the constraints put on transforms specified in VC-1 is the requirement that the
16-bit values used produce results that can fit in 16 bits. The constraints on transforms
ensure that decoding is as efficient as possible on a wide range of devices.
Motion Compensation
Motion compensation is the process of generating a prediction of a video
frame by displacing the reference frame. Typically, the prediction is formed for a
block (an 8 × 8 pixel tile) or a macroblock (a 16 × 16 pixel tile) of data. The
displacement of data due to motion is defined by a motion vector, which captures the
shift along both the x- and y-axes.
Figure VC-1 motion compensation sizes [4]
14
15. The efficiency of the codec is affected by the size of the predicted block, the
granularity of sub-pixel data that can be captured, and the type of filter used for
generating sub-pixel predictors. VC-1 uses 16 × 16 blocks for prediction, with the
ability to generate mixed frames of 16 × 16 and 8 × 8 blocks. The finest granularity of
sub-pixel information supported by VC-1 is 1/4 pixel. Two sets of filters are used by
VC-1 for motion compensation. The first is an approximate bicubic filter with four
taps. The second is a bilinear filter with two taps. The four-tap bicubic filters used in
VC-1 for ¼ and ½ pixel shifts are: [-4 53 18 -3]/64 and [-1 9 9 -1]/16.
Figure – Integer, half and quarter pel positions [2]
(A-Q Integer, aa-hh half, a-s quarter pel positions)
VC-1 combines the motion vector settings defined by the block size, sub-
pixel resolution, and filter type into modes. The result is four motion compensation
modes that suit a range of different situations. This classification of settings into
modes also helps compact decoder implementations.
Loop Filtering
VC-1 uses an in-loop deblocking filter that attempts to remove block-
boundary discontinuities introduced by quantization errors in interpolated frames.
These discontinuities can cause visible artifacts in the decompressed video frames and
can impact the quality of the frame as a predictor for future interpolated frames.
15
16. Figure – Loop filtering in VC-1 [4] (Only pixel p4 and p5 are filtered)
The loop filter takes into account the adaptive block size transforms. The filter
is also optimized to reduce the number of operations required.
Interlaced Coding
Interlaced video content is widely used in television broadcasting. When
encoding interlaced content, the VC-1 codec can take advantage of the characteristics
of interlaced frames to improve compression. This is achieved by using data from
both fields to predict motion compensation in interpolated frames.
Advanced B Frame Coding
A bi-directional or B frame is a frame that is interpolated from data both in
previous and subsequent frames. B frames are distinct from I frames (also called key
frames), which are encoded without reference to other frames. B frames are also
distinct from P frames, which are interpolated from previous frames only. VC-1
includes several optimizations that make B frames more efficient. VC-1 does not have
a fixed group of pictures (GOP) structure and the number of pictures in a GOP can
vary.
Fading Compensation
Due to the nature of compression that uses motion compensation, encoding of
video frames that contain fades to or from black is very inefficient. With a uniform
fade, every macroblock needs adjustments to luminance. VC-1 includes fading
compensation, which detects fades and uses alternate methods to adjust luminance.
This feature improves compression efficiency for sequences with fading and other
global illumination changes.
Differential Quantization
Differential quantization, or dquant, is an encoding method in which multiple
quantization steps are used within a single frame. Rather than quantize the entire
frame with a single quantization level, macroblocks are identified within the frame
that might benefit from lower quantization levels and greater number of preserved AC
16
17. coefficients. Such macroblocks are then encoded at lower quantization levels than the
one used for the remaining macroblocks in the frame. The simplest and typically most
efficient form of differential quantization involves only two quantizer levels (bi-level
dquant), but VC-1 supports multiple levels, also.
MAPPING DIFFERENCES BETWEEN THE TWO STANDARDS:
The transcoding algorithm considered in this research assumes full H.264
decoding down to the pixel level, followed by a reduced complexity VC-1 encoding.
The data gathered during the H.264 decoding stage is used to accelerate the VC-1
encoding stage. It is assumed that the H.264 encoded bitstream is generated with an
R-D optimized encoder. The picture coding types used are similar in both the
standards. The transform size and type are different and makes transform domain
transcoding prohibitively complex. The semantics of intra MBs are similar except for
the intra directional prediction allowed in H.264 and the mixed MBs in VC-1. The
inter prediction has significant differences including the block size of MC, block size
of transform, and reference frames used. These similarities between the codecs can be
exploited in reducing the transcoding complexity.
Intra MB Mode Mapping:
An intra MB in the incoming H.264 bitstream is coded as a VC-1 intra MB. A
H.264 intra MB can be coded as Intra 4x4 (9 different directional modes) or Intra
16x16 (4 different modes). But a VC-1 intra MB has four 8x8 blocks and has no
prediction modes. Since intra MB in VC-1 uses 8x8 transform, irrespective of the
block size (16x16 or 4x4) in H.264, we need not carry over the information of the
intra prediction type in H.264. Table 2 shows the proposed intra MB mapping.
H.264 Intra MB VC-1 Intra MB
Intra 16x16 (Any mode) Intra MB 8x8
Intra 4x4 (Any mode) Intra MB 8x8
Table 2 H.264 and VC-1 Intra MB mapping
Figure – Matrix for one-dimensional 8-point inverse transform [32]
Inter MB Mode Mapping:
17
18. An inter coded MB in the incoming H.264 bitstream is coded as inter MB in
VC-1. The inter MB in H.264 has 7 different motion compensation sizes – 16x16,
16x8, 8x16, 8x8, 4x8, 8x4, 4x4. The inter MB in VC-1 has 2 different motion
compensation sizes 16x16 and 8x8. Another significant difference is that H.264 uses
4x4 (and 8x8 in fidelity range extensions) transform sizes where as VC-1 uses 4
different transform sizes – 8x8, 4x8, 8x4 and 4x4.
The 16x16, 8x16, 16x8 motion compensation sizes are usually selected in
H.264 for areas that are relatively uniform and will be mapped to inter 16x16 MB in
VC-1 using the selected H.264 MC block size as a measure of homogeneity in the
block to be able to differentiate the transform size to be applied in VC-1.
The 8x8, 8x4, 4x8 and 4x4 modes are usually selected in H.264 for areas that
have non-uniform motion. The 16x16 mode in VC-1 is eliminated for such non-
uniform MBs. The MB is then mapped to 8x8 block size in VC-1 with the H.264
block size determining the transform size to be used in VC-1.
Table 3 describes the decision making for mapping the inter MBs and the type of
transform to be used in VC-1.
H.264 Inter MB VC-1 Inter MB Transform size in VC-1
Inter 16x16 Inter 16x16 8x8
Inter 16x8 Inter 16x16 8x4
Inter 8x16 Inter 16x16 4x8
Inter 8x8 Inter 8x8 8x8
Inter 4x8 Inter 8x8 4x8
Inter 8x4 Inter 8x8 8x4
Inter 4x4 Inter 8x8 4x4
Table 3 H.264 and VC-1 Inter MB mapping and VC-1 transform type
Motion vector mapping:
Re-use of motion vectors selected in H.264 can significantly reduce the complexity of
VC-1 encoding. Table 4 describes the re-use of motion vectors.
H.264 Inter MB VC-1 Inter MB Motion Vector Re-use
Inter 16x16 Inter 16x16 Same motion vectors
Inter 16x8 Inter 16x16 Average of motion vectors
Inter 8x16 Inter 16x16 Average of motion vectors
Inter 8x8 Inter 8x8 Same motion vectors
Inter 4x8 Inter 8x8 Average of motion vectors
Inter 8x4 Inter 8x8 Average of motion vectors
Inter 4x4 Inter 8x8 Average of motion vectors
Table 4 H.264 and VC-1 Inter MB motion vector mapping
18
19. Reference Pictures:
H.264/AVC standard defines the use of up to sixteen reference pictures for motion
estimation, while VC-1 uses only one or two, according to the slice type P or B
respectively. The reuse of motion vectors implies using the same reference pictures to
maintain their meaning. The motion vector conversion assumes that motion vector
length is related to the reference image distance [39]. The source motion vectors are
scaled, according to figure 12 in order to use valid VC-1 reference pictures. This
conversion assumes constant motion between H.264/AVC and VC-1 reference
pictures. The motion vector conversion is performed by scaling it with the temporal
distance between the two reference pictures.
H.264
VC-1
Fig 12 Motion vector scaling [38]
Skipped Macroblock:
When a skipped macro block is signaled in the bit stream, no further data is sent for
that macro block. The mode conversion of H.264 skip macroblocks to VC-1 skip is a
straightforward process. Since the skip macro block definition of both standards is
fully compatible, a direct conversion is possible.
OPEN LOOP TRANSCODER:
The open loop transcoder is designed by cascading a H.264 encoder [44], H.264 [44]
decoder, VC-1 encoder [45] and a VC-1 decoder [45].
YUV H.264 Encoder H.264 Decoder VC-1 Encoder VC-1 Decoder YUV
Fig 13 Open loop transcoder
Performance of open loop transcoder
Mean square error (MSE), peak-to-peak signal to noise ratio (PSNR), structural
similarity index measure (SSIM) for Foreman QCIF (3 frames) is calculated using the
open loop transcoder.
19
20. Fig 14 MSE of open loop transcoder – Foreman sequence
Fig 15 PSNR of open loop transcoder – Foreman sequence
20
21. Fig 16 SSIM of open loop transcoder – Foreman sequence
CONCLUSIONS:
As mentioned earlier, it is proposed to transcode an H.264 bitstream to a VC-1
stream in the pixel domain (CPDT) and compare the results (MSE, PSNR, SSIM,
complexity, bit rates) against an open loop transcoder. On the encoder side, since
there is no re-estimation of the motion vectors, the complexity on the encoder side
reduces by about 40-50%. Road map ahead is to extract re-usable information from
the H.264 bitstream to be used in VC-1 encoding.
REFERENCES:
[1] VC-1 Compressed Video Bitstream Format and Decoding Process (SMPTE
421M-2006), SMPTE Standard, 2006.
[2] T. Wiegand et al, “Overview of the H.264/AVC video coding standard,” IEEE
Trans. CSVT, Vol. 13, pp. 560-576, July 2003.
[3] C. Chen, P-H.Wu and H. Chen, “MPEG-2 to H.264 transcoding,” Picture Coding
Symposium, pp. 15-17 Dec, 2004.
[4] Jae-Beom Lee and H. Kalva, "An efficient algorithm for VC-1 to H.264 video
transcoding in progressive compression," IEEE International Conference on
Multimedia and Expo, pp. 53-56, July 2006
[5] J Xin, C.W. Lin and M.T. Sun, “Digital video transcoding”, Proceedings of the
IEEE, Vol. 93, pp 84-97, Jan 2005.
[6] A. Vetros, C. Christopoulos and H. Sun, “Video transcoding architectures and
techniques: An overview”, IEEE Signal Processing Magazine, Vol. 20, pp 18-29,
March 2003.
[7] Advanced Video Coding for Generic Audiovisual Services, ITU-T Rec. H.264 /
ISO / IEC 14496-10, Mar 2005.
[8] S. Srinivasan and S. L. Regunathan, “An overview of VC-1” Proc. SPIE, vol.
5960, pp. 720–728, 2005.
[9] P. List et al, “Adaptive deblocking filter,” IEEE Trans. Circuits Syst. Video
Technol., vol. 13, pp.614–619, Jun. 2003.
[10]T. D. Tran, J. Liang and C. Tu, “Lapped transform via time-domain pre- and post-
filtering,” IEEE Trans. Signal Proc., vol. 51, pp. 1557–1571, Jun. 2003.
21
22. [11]C. C. Cheng, T. S. Chang, and K. B. Lee, “An in-place architecture for the
deblocking filter in H.264/AVC,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol.
53, pp. 530–534, Jul. 2006.
[12]T. C. Chen et al “Analysis and architecture design of an HDTV720p 30 frames/s
H.264/AVC encoder,” IEEE Trans. Circuits Syst. Video Technol., vol. 16, pp. 673
– 688, Jun. 2006.
[13]Y.-W. Huang et al “Architecture design for deblocking filter in H.264 / JVT /
AVC,” in IEEE Proc. Int. Conf. Multimedia and Expo, pp. 693–696, July 2003.
[14]S.-C. Chang et al “A platform based bus-interleaved architecture for de-blocking
filter in H.264/MPEG-4 AVC,” IEEE Trans. Consumer Electron., vol. 51, pp.
249–255, Feb 2005.
[15]M. Sima, Y. Zhou, and W. Zhang, “An efficient architecture for adaptive
deblocking filter of H.264/AVC video coding,” IEEE Trans. Consumer
Electronics, vol. 50, pp. 292–296, Feb. 2004.
[16]S.-Y. Shih, C.-R. Chang and Y.-L. Lin, “A near optimal deblocking filter for
H.264 advanced video coding” in Proc. Asia and South Pacific Design
Automation Conf., pp. 170–175, Jan 2006.
[17]T.-M. Liu et al, “A memory-efficient deblocking filter for H.264/AVC video
coding,” in Proc. IEEE Int. Symp. Circuits Syst., pp. 2140–2143, May 2005.
[18]T.-M. Liu et al, “A 125 µ W fully scalable MPEG-2 and H.264/AVC video
decoder for mobile applications,” IEEE J. Solid-State Circuits, vol. 42, pp. 161–
169, Jan. 2007.
[19]L. Li, S. Goto and T. Ikenaga, “An efficient deblocking filter architecture with 2-
dimensional parallel memory for H.264/AVC,” in Proc. Asia and South Pacific
Design Automation Conf., pp.623–626, 2005
[20]H.-Y. Lin et al “Efficient deblocking filter architecture for H.264 video coders,”
in IEEE ISCAS, pp 4, May 2006
[21]T.-M. Liu, W.-P. Lee and C.-Y. Lee, “An in/post-loop deblocking filter with
hybrid filtering schedule” IEEE Trans. Circuits Syst. for Video Technol., vol. 17,
pp. 937–943, Jul. 2007.
[22]I. Ahmad et al, “Video transcoding: An overview of various techniques and
research Issues”, IEEE Trans. on Multimedia, vol. 7, pp. 793-8, Oct. 2005
22
23. [23]Y.L Lee and T.Q Nguyen, "Analysis and efficient architecture design for VC-1
overlap smoothing and in-loop deblocking Filter," IEEE Trans Circuits and Syst.
for Video Technol, vol.18, pp 1786-1796, Dec. 2008
[24]G. Fernandez-Escribano et al, “Speeding-up the macroblock partition mode
decision for MPEG-2 to H.264 transcoding,” Proceedings of IEEE ICIP 2006,
Atlanta, pp 869-872, Sept 2006.
[25]Z. Zhou et al "Motion information and coding mode reuse for MPEG-2 to H.264
transcoding", Proceedings of the IEEE ISCAS 2005, pp 1230-1233, May 2005.
[26]B. Petljanski and H. Kalva, “DCT domain intra MB mode decision for MPEG-2
to H.264 transcoding” Proceedings of the IEEE ICCE 2006, pp. 419-420, Jan
2006.
[27]J. Bialkowski, A. Kaup and K. Illgner, “Fast transcoding of intra frames between
H.263 and H.264,” IEEE ICIP, vol.4, pp. 2785- 2788, Oct 2004.
[28]Y.-K. Lee, S.-S. Lee, and Y.-L. Lee, “MPEG-4 to H.264 transcoding using
macroblock statistics,” Proceedings of the IEEE ICME 2006, pp.57-60, Toronto,
Canada, July 2006.
[29]G. Sullivan, P. Topiwalla and A. Luthra, “The H.264/AVC video coding
standard: overview and introduction to the fidelity range extensions”, SPIE
Conference on Applications of Digital Image Processing XXVII, vol. 5558, pp.
53-74 Aug 2004.
[30]T. Weigand et al, “Introduction to the Special Issue on Scalable Video Coding—
Standardization and Beyond” IEEE Trans on Circuits and Systems for Video
Technology, Vol 17, pp 1034, Sept 2007.
[31]Von Roden and T. Praktische, “H.261 and MPEG1- A comparison” Conference
Proceedings of the 1996 IEEE Fifteenth Annual International Phoenix Conference
on Computers and Communications, pp.65-71, Mar 1996
[32]S. Srinivasan et al, “Windows Media Video 9: overview and applications” Signal
Processing: Image Communication, Vol 19, pp 851-875, Oct 2004.
[33]S. K. Kwon, A. Tamhankar and K.R. Rao, "An overview of H.264/MPEG-4 Part
10," Special issue of Journal of Visual Communication and Image
Representation,vol.17, pp 186-216, April 2006.
[34]G.A Davidson et al, “ATSC video and audio coding”, Proc. IEEE, vol 94, pp
60-76, Jan 2006.
23
24. [35]J. Bialkowski, M Barkowky and A. Kaup, “Overview of low complexity video
transcoding from H.263 to H.264” IEEE ICME, pp 49-52, 2006.
[36]T. D. Nguyen et al, “Efficient MPEG-4 to H.264/AVC transcoding with spatial
downscaling”, ETRI Journal, vol.29, no.6, pp 826-828, Dec. 2007.
[37]H. Kalva, G.F. Escribano and K Kunzelmann, “Reduced resolution MPEG-2 to
H.264 transcoder” Proc. SPIE, Vol. 7257, 72571V Jan 2009.
[38]S Moiron et al, "H.264/AVC to MPEG-2 video transcoding architecture", Proc
Conf. on Telecommunications - ConfTele, Peniche, Portugal, Vol. 1, pp. 449 -
452, May, 2007.
[39]S Moiron et al, “Video transcoding from H.264/AVC to MPEG-2 with reduced
computational complexity”, Signal Processing: Image Communication, vol 24, pp
637-650, September 2009
[40]Mei-Juan Chen, Ming-Chung Chu and Chih-Wei Pan, “Efficient motion-
estimation algorithm for reduced frame-rate video transcoder”, IEEE Trans on
Circuits and Systems for Video Technology, vol. 12, pp. 269–275, Apr. 2002.
[41]ISO/IEC 11172-2:1993 Information technology -- Coding of moving pictures and
associated audio for digital storage media at up to about 1,5 Mbits/s -- Part 2:
Video
[42]H. Kalva and J.B. Lee, "The VC-1 Video Coding Standard," IEEE Multimedia,
vol. 14, pp. 88-91, Oct.-Dec. 2007
[43]P. Bordes, A. Orhand, “Improved Algorithm for fast transcoding H.264”
EUSIPCO 2007.
REFERENCE BOOKS:
[44]K. Sayood, “Introduction to Data compression”, III edition, Morgan
Kauffmann publishers, 2006.
[45]I.E.G. Richardson, “H.264 and MPEG-4 video compression: video coding for
next-generation multimedia”, Wiley, 2003.
24
25. [46]K. R. Rao and P. C. Yip, “The transform and data compression handbook”,
Boca Raton, FL: CRC press, 2001.
[47]K.R. Rao and J.J. Hwang “Techniques and Standards for Image, Video, and
Audio Coding” - Prentice Hall, 1996.
[48]J.B. Lee and H. Kalva, The VC-1 and H.264 Video Compression Standards
for Broadband Video Services, Springer, 2008.
REFERENCE WEBSITES:
[49]JM software : http://iphome.hhi.de/suehring/tml/
[50]VC-1 Software : http://www.smpte.org/home
[51]Microsoft website - VC-1 Technical Overview
http://www.microsoft.com/windows/windowsmedia/howto/articles/vc1techoverview.aspx#VC1C
omparedtoOtherCodecs
[52]VC-1 Wikipedia site - http://en.wikipedia.org/wiki/VC-1
[53]
ACRONYMS:
ASO Arbitrary slice ordering
AVC Advanced Video Coding
B MB Bi-predicted MB
CDDT Cascaded DCT Domain Transcoder
CPDT Cascaded Pixel Domain Transcoder
DCT Discrete Cosine Transform
DSP Digital Signal Processing
DVD Digital Versatile Disc
FMO Flexible macroblock ordering
FRExt Fidelity Range Extensions
GOP Group Of Pictures
I MB Intra Predicted MB
IEC International Electrotechnical Commission
ISO International Organization for Standardization
ITU-T International Telecommunication Union – Transmission
sector
JVT Joint Video Team
P MB Inter Predicted MB
IDCT Inverse Discrete Cosine Transform
IQ Inverse Quantizer
MB Macroblock
25
26. ME Motion Estimation
MC Motion Compensation
MV Motion Vector
MPEG Moving Picture Experts Group
MSE Mean Square Error
PSNR Peak –to – peak Signal to Noise Ratio
Q Quantizer
R-D Rate - Distortion
SDDT Simplified DCT Domain Transcoder
SP/SI Switched P / Switched I
SMPTE Society of Motion Picture and Television Engineers
SSIM Structural Similarity Index Measure
SVC Scalable Video Coding
VCEG Video Coding Experts Group
VLC Variable Length Coding
VLD Variable Length Decoder
YUV Y- Luminance and UV- Chrominance
26