DevoxxFR 2024 Reproducible Builds with Apache Maven
Fast block motion estimation with 8 bit partial sums using SIMD architecture
1. Fast Block Motion Estimation With 8-
Bit Partial Sums Using SIMD
Architectures
Presented by:
•Ahmed Abdel-Hafeez
•Ahmed El-Bohy
•Ahmed Emam
•Ahmed Kandil
Supervised by/Presented to:
Pf.Dr. Attalah Hashaad
Published by: Chunjiang J. Duanmu et. al.
Published in August 2007.
2. Outline
• Abstract.
• Introduction.
• 8-bit partial sums.
• Multilevel 8-bit partial sums.
• Computational complexity.
• Simulation Results.
• Conclusion.
2ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
3. Abstract
• Fast block motion estimation algorithms are needed for real-time
implementations of video coding standards due to the high computational
complexity of the full-search algorithm for block motion estimation.
• In this paper, an algorithm using 8-bit partial sums of 16 luminance values
for a fast block motion estimation is proposed. The technique of using the
partial sums is employed to reduce the computational complexity of not
only the full search algorithm but also some of the fast block motion
estimation algorithms while maintaining their accuracy.
• Furthermore, it is shown that the byte-type data-parallelism on an SIMD
architecture can be utilized to access and process these partial sums
concurrently to accelerate the process of motion estimation.
• Simulation results are presented to demonstrate that the use of the
partial sums can accelerate the execution of the full-search and another
search algorithms on an SIMD architecture significantly.
3ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
4. 4
Introduction- - Applications
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
Basics
5. Chronological Table of Video Coding Standards
The objective of video coding is to compress moving images
H.261
(1990)
MPEG-1
(1993)
H.263
(1995/96) H.263+
(1997/98)
H.263++
(2000)
H.264
( MPEG-4
Part 10 )
(2002)
MPEG-4 v1
(1998/99)
MPEG-4 v2
(1999/00)
MPEG-4 v3
(2001)
1990 1992 1994 1996 1998 2000 2002 2003
MPEG-2
(H.262)
(1994/95)
ISO/IEC
MPEG
ITU-T
VCEG
5
6. Introduction-Basics- Video
6
Frame 1 Frame 2 Frame 3 Frame 4
Luminance (Y) : Describes the brightness of the pixel.
Chrominance (CbCr) : Describes the color of the pixel.
Frame
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
7. Introduction-Basics- Video Data
Drawback
• An uncompressed video data is big in size.
– This is due to data redundancy, there are two
general types of data redundancy in a video:
7
Spatial redundancy
In a frame, adjacent pixels are
usually correlated. e.g. - The grass is
green in the background of a frame.
Frame 1 Frame 2 Frame 3 Frame 4
Time based redundancy
In a video, adjacent frames are
usually correlated. e.g. - The green
background is persisting frame after
frame.
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
8. • Predict current frame based on previously coded
frames
• Types of coded frames:
– I-frame – Intra-coded frame, coded independently of all
other frames
– P-frame – Predictively coded frame, coded based on
previously coded frame
– B-frame – Bi-directionally predicted frame, coded based on
both previous and future coded frames
Introduction-Basics- Video
Compression
8
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
10. • What is Motion Estimation?
– Predict current frame from previous
frame
– Determine the displacement of an object
in the video sequence
– The amount of data to be coded can be
reduced significantly if the previous frame
is subtracted from the current frame.
10
Motion Estimation
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
11. Block Based Motion Estimation Algorithms
Time-domain Algorithms Frequency-domain Algorithms
Matching Algorithms Gradient Based Algorithms
Block-Matching
Feature-
matching
Pel-recursive Block-recursive Phase-
correlation
(DFT)
Matching
in (DCT)
domain
Matching
in wavelet
domain
Mesh Based Motion Estimation Algorithms
Motion Estimation Classification
11ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
15. • CCF(Cross-Correlation Function)
• MSE(Mean Square Error Function)
• MAE(Mean Absolute Error)
• SAD(Sum of Absolute Difference)
• PDC(Pixel Difference Classification)
• MAE(or MAD,SAD are commonly employed due to their
simplicity in hardware implementation)
Distortion Criterion for measuring distance between
previous block and search area block
15ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
16. SAD(dx,dy) =
(MVx, MVy) = min (dx,dy)ЄR2 SAD(dx,dy)
1 1
1 |),(),(|
Nx
xm
Ny
yn
kk dyndxmInmI
SAD
16ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
18. Search Algorithms
(ctd)
• There is a trade-off between the run time and
the accuracy.
• Full search will be most accurate because of
exhaustive search, but will require more time
• Fast search is faster but the accuracy will be
reduced because of estimation algorithms.
18ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
19. Full-Search
19ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
not suitable for real time.
20. •Simplest algorithm, but computationally most expensive
20
Exhaustive Search
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
21. Three Step Search (3SSA)
21ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
22. Three Step Search (3SSA)
(ctd)
22ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
23. Three Step Search (3SSA)
(ctd)
23ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
24. Three Step Search (3SSA)
(ctd)
24ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
25. 25
3SSA Block Matching
►Three-Step Search (3SS)
– 9 Points: Central point & its 8
surroundings
– Distance: w/2
– Find the best match
– Use previous best as center
– Half distance, select 8 new
– Repeat algorithm 3 times
– Examines 25 points
– Assumes a uniform
distribution of MV’s
1
1
11
11
1 1
1
2
3
2
2
222
2
2
3
3
3 3 3
3
3
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
26. 4SSA
26ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
27. Unrestricted center-bitiased Diamond
Search Algorithm (UDSA)
27ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
29. Problem Definition
• The high computational requirement of the Full
Search (FS) algorithm does not allow it to work in
real time applications, despite its high accuracy.
• Fast Block motion estimation algorithms have
lower computational complexity, but lower
accuracy.
• Since, fast block motion estimation are chosen
for real time applications Hence in this paper
too.
29ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
30. Aim
• To improve the accuracy of some of the fast
block motion estimation techniques without
increasing the computational complexity.
• To make best use of Single Instruction
Multiple Data (SIMD) architecture and to take
advantage of byte-type data-parallelism to
further accelerate the execution of the
algorithms to achieve the main goal.
30ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
31. Limitation
• If the partial sums for an algorithm is more
than 8 bits for a reference block cannot be
put, accessed, and manipulated in a
contiguous memory space, since there are
partial sums of other reference blocks lying in
between; due to this, a large number of CPU
cycles are lost in manipulating these data. As a
consequence, these algorithms are not
suitable for SIMD implementations.
31ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
32. Procedure
• Devise a scheme that uses only 8 bit partial
sum and discard as many SAD computations
as possible, without excluding the optimal
motion vector.
– The proposed partial sums can not only be utilized
in the full-search algorithm as well as in some of
the fast block motion-estimation algorithms.
• Devise a scheme that generalises the previous
scheme to multi-level case and optimally
utilise it.
32ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
33. Partial Sums
33
268
+ 483
600Add the hundreds (200 + 400)
Add the tens (60 +80) 140
Add the ones (8 + 3)
Add the partial sums
(600 + 140 + 11)
+ 11
751
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
34. 8 Bit Partial Sums- Objective
• The objective of this paper is to find new
partial sums of only eight bits, so that they
can be of the packed byte-type on an SIMD
architecture.
• In this way, eight additions or subtractions, for
the partial sums can be executed in one SIMD
instruction
34ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
36. Lower Bound
36ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
using
37. Scheme One- Algorithm
37ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
• Step 1) Initialization
a) Compute all of the 8-bit partial sums of
sixteen luminance values for the current
frame and save them in a contiguous
memory space.
b) Retrieve all the 8-bit partial sums of sixteen
luminance values for the reference frame in a
saved contiguous memory
38. Scheme One- Algorithm
(ctd)
• Step 2) For every current block, execute the block
motion-estimation process.
– Step 2.1) Initialization
38ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
39. Scheme One- Algorithm
(ctd)
– Step 2.2) Search
• For (each search location of in a motion-
estimation algorithm)
39ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
40. 40
Scheme One- Flow Chart
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
41. Multilevel 8-bit Partial Sums
16 X 16
41
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
49. Partial Sum Pyramid
Partial Sum Pyramid
8 x 16
4 x 16
2 x 16
1 x 16
Level 1 Level 2 Level 3 Level 4
49
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
50. 50
Multilevel 8-bit Partial Sums- Upper
Bound (UB)
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
.
51. Scheme Two Algorithm
51ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
• Step 1) Initialization
a) Compute all of the 8-bit partial sums of levels
one and four for the current frame and save
them in a contiguous memory space.
b) Retrieve all of the 8-bit partial sums of levels
one and four for the reference frame in a
saved contiguous memory space.
52. Scheme Two Algorithm (ctd)
52ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
• Step 2) For every current block, execute the block
motion-estimation process.
– Step 2.1) Initialization
53. Scheme Two Algorithm (ctd)
53
– Step 2.2) Search
• For (each search location of in a motion-
estimation algorithm)
55. Possible Conditions
55
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
Condition 1:
Condition 2:
Condition 3:
Condition 4:
59. SIMD
59ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
60. COMPUTATIONAL COMPLEXITY AND AVERAGE
NUMBER OF CPU CYCLES PER BLOCK USING FSA
60ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
61. COMPUTATIONAL COMPLEXITY AND AVERAGE
NUMBER OF CPU CYCLES PER BLOCK USING SEA
61ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
62. COMPUTATIONAL COMPLEXITY AND AVERAGE
NUMBER OF CPU CYCLES PER BLOCK USING 3SSA
62
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
63. COMPUTATIONAL COMPLEXITY ANDAVERAG
ENUMBER OF CPU CYCLES PER BLOCK USING 4SSA
63
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
64. COMPUTATIONAL COMPLEXITY AND AVERAGE
NUMBER OF CPU CYCLES PER BLOCK USING UDSA
64ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
65. COMPUTATIONAL COMPLEXITY AND AVERAGE
NUMBER OF CPU CYCLES PER BLOCK USING HBSA
65ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
66. THE PERCENTAGE OF SPEEDUP OFFERED BY SIMD IMPLEMENTATION FOR
A MOTION ESTIMATION ALGORITHM WITH SCHEME 2 INCORPORATED
66ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
67. Conclusion
Introduced a new technique of 8 bit partial
sum.
The partial sums were used to make best use
of SIMD architecture, and hence improving
the speed of motion estimation algorithm.
Since these partial sums have the
characteristic of having only 8 bits, eight of
them can be processed concurrently using a
single 64-bit SIMD register.
67ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
68. Conclusion
The notion of the 8-bit partial sums has then been
extended to the four-level case and shown that there are
15 possible methods of utilizing these multilevel partial
sums to accelerate the block motion-estimation algorithms
without any loss of accuracy.
The full-search algorithm has then been used to determine
as to which one of these 15 methods would provide the
lowest computational complexity in order for it to be
chosen to accelerate various motion-estimation algorithms.
68ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
69. Conclusion
Extensive simulations have been carried out to find
the average number of CPU cycles needed per block for
various algorithms incorporating the chosen method.
These simulations have shown that the proposed
scheme is capable of providing a substantial speed-up
for the various existing motion-estimation algorithms
through the reduction of their computational
complexities.
The simulation results also demonstrate that the
implementation on an SIMD architecture can further
accelerate the proposed scheme by more than 93%.
69ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
70. 70
References
1. “FPGA Implementation of a Novel, Fast Motion Estimation Algorithm for Real-Time Video
Compression”, FPGA 2001, CA. USA, S. Ramachandran and S. Srinivasan, Feb. 2001
2. “Image & Video Compression for Multimedia Engineering”, Y.Q. Shi and H. Sun, 2000
3. “A New Diamond Search Algorithm for Fast Block-Matching Motion Estimation”, IEEE Trans. Image
Processing, S. Zhu and K. K. Ma, Feb. 2000
4. “A Novel Four-Step Search Algorithm for Fast Block Motion Estimation”, IEEE Trans. Circuits System,
Video Technology, L. M. Po and W. C. Ma, June 1996
5. “Successive Elimination Algorithm for Motion Estimation” W. Li and E. Salari IEEE Trans. , Jan. 1995
6. “A New Three-Step Search Algorithm for Block Motion Estimation”, IEEE Trans. Circuits System,
Video Technology, R. Li, B. Zeng, and M.L. Liou, Aug. 1994
7. “Predictive Coding Based on Efficient Motion Estimation”, IEEE Trans. on communications, R.
Srinivasan, K.R. Rao, Aug. 1985
8. “Motion Compensated Inter-Frame Coding for Video-Conferencing”, T. Koga, K. Iinuma, A. Hirano,
Y. Iijima, and T. Ishiguro, Proc. NTC81, Nov. 1981
9. “Displacement Measurement and its Applications”, IEEE Trans. on communications, J.R. Jain and
A.K Jain, Dec. 1981
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
71. 71ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide