SlideShare une entreprise Scribd logo
1  sur  71
Fast Block Motion Estimation With 8-
Bit Partial Sums Using SIMD
Architectures
Presented by:
•Ahmed Abdel-Hafeez
•Ahmed El-Bohy
•Ahmed Emam
•Ahmed Kandil
Supervised by/Presented to:
Pf.Dr. Attalah Hashaad
Published by: Chunjiang J. Duanmu et. al.
Published in August 2007.
Outline
• Abstract.
• Introduction.
• 8-bit partial sums.
• Multilevel 8-bit partial sums.
• Computational complexity.
• Simulation Results.
• Conclusion.
2ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
Abstract
• Fast block motion estimation algorithms are needed for real-time
implementations of video coding standards due to the high computational
complexity of the full-search algorithm for block motion estimation.
• In this paper, an algorithm using 8-bit partial sums of 16 luminance values
for a fast block motion estimation is proposed. The technique of using the
partial sums is employed to reduce the computational complexity of not
only the full search algorithm but also some of the fast block motion
estimation algorithms while maintaining their accuracy.
• Furthermore, it is shown that the byte-type data-parallelism on an SIMD
architecture can be utilized to access and process these partial sums
concurrently to accelerate the process of motion estimation.
• Simulation results are presented to demonstrate that the use of the
partial sums can accelerate the execution of the full-search and another
search algorithms on an SIMD architecture significantly.
3ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
4
Introduction- - Applications
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
Basics
Chronological Table of Video Coding Standards
The objective of video coding is to compress moving images
H.261
(1990)
MPEG-1
(1993)
H.263
(1995/96) H.263+
(1997/98)
H.263++
(2000)
H.264
( MPEG-4
Part 10 )
(2002)
MPEG-4 v1
(1998/99)
MPEG-4 v2
(1999/00)
MPEG-4 v3
(2001)
1990 1992 1994 1996 1998 2000 2002 2003
MPEG-2
(H.262)
(1994/95)
ISO/IEC
MPEG
ITU-T
VCEG
5
Introduction-Basics- Video
6
Frame 1 Frame 2 Frame 3 Frame 4
Luminance (Y) : Describes the brightness of the pixel.
Chrominance (CbCr) : Describes the color of the pixel.
Frame
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
Introduction-Basics- Video Data
Drawback
• An uncompressed video data is big in size.
– This is due to data redundancy, there are two
general types of data redundancy in a video:
7
Spatial redundancy
In a frame, adjacent pixels are
usually correlated. e.g. - The grass is
green in the background of a frame.
Frame 1 Frame 2 Frame 3 Frame 4
Time based redundancy
In a video, adjacent frames are
usually correlated. e.g. - The green
background is persisting frame after
frame.
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
• Predict current frame based on previously coded
frames
• Types of coded frames:
– I-frame – Intra-coded frame, coded independently of all
other frames
– P-frame – Predictively coded frame, coded based on
previously coded frame
– B-frame – Bi-directionally predicted frame, coded based on
both previous and future coded frames
Introduction-Basics- Video
Compression
8
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
Block Matching
9ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
• What is Motion Estimation?
– Predict current frame from previous
frame
– Determine the displacement of an object
in the video sequence
– The amount of data to be coded can be
reduced significantly if the previous frame
is subtracted from the current frame.
10
Motion Estimation
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
Block Based Motion Estimation Algorithms
Time-domain Algorithms Frequency-domain Algorithms
Matching Algorithms Gradient Based Algorithms
Block-Matching
Feature-
matching
Pel-recursive Block-recursive Phase-
correlation
(DFT)
Matching
in (DCT)
domain
Matching
in wavelet
domain
Mesh Based Motion Estimation Algorithms
Motion Estimation Classification
11ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
Motion Estimation
(ctd)
12ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
Motion Estimation
(ctd)
13ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
14
Motion Estimation
(ctd)
Reference
Frame
Current
Frame
Current 16x16 Block
Search
Window
Sum of Absolute
Difference (SAD)
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
• CCF(Cross-Correlation Function)
• MSE(Mean Square Error Function)
• MAE(Mean Absolute Error)
• SAD(Sum of Absolute Difference)
• PDC(Pixel Difference Classification)
• MAE(or MAD,SAD are commonly employed due to their
simplicity in hardware implementation)
Distortion Criterion for measuring distance between
previous block and search area block
15ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
SAD(dx,dy) =
(MVx, MVy) = min (dx,dy)ЄR2 SAD(dx,dy)
1 1
1 |),(),(|
Nx
xm
Ny
yn
kk dyndxmInmI
SAD
16ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
Search Algorithms
17
Search
Algorithms
FAST
MULTISTEP
3SS 4SS HBS UDS
EXHAUSTIVE
SE MSE VF PFGSE
FULL
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
Search Algorithms
(ctd)
• There is a trade-off between the run time and
the accuracy.
• Full search will be most accurate because of
exhaustive search, but will require more time
• Fast search is faster but the accuracy will be
reduced because of estimation algorithms.
18ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
Full-Search
19ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
not suitable for real time.
•Simplest algorithm, but computationally most expensive
20
Exhaustive Search
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
Three Step Search (3SSA)
21ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
Three Step Search (3SSA)
(ctd)
22ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
Three Step Search (3SSA)
(ctd)
23ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
Three Step Search (3SSA)
(ctd)
24ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
25
3SSA Block Matching
►Three-Step Search (3SS)
– 9 Points: Central point & its 8
surroundings
– Distance: w/2
– Find the best match
– Use previous best as center
– Half distance, select 8 new
– Repeat algorithm 3 times
– Examines 25 points
– Assumes a uniform
distribution of MV’s
1
1
11
11
1 1
1
2
3
2
2
222
2
2
3
3
3 3 3
3
3
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
4SSA
26ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
Unrestricted center-bitiased Diamond
Search Algorithm (UDSA)
27ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
Hexagon-Bitased search algorithm
(HBSA)
28ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
Problem Definition
• The high computational requirement of the Full
Search (FS) algorithm does not allow it to work in
real time applications, despite its high accuracy.
• Fast Block motion estimation algorithms have
lower computational complexity, but lower
accuracy.
• Since, fast block motion estimation are chosen
for real time applications  Hence in this paper
too.
29ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
Aim
• To improve the accuracy of some of the fast
block motion estimation techniques without
increasing the computational complexity.
• To make best use of Single Instruction
Multiple Data (SIMD) architecture and to take
advantage of byte-type data-parallelism to
further accelerate the execution of the
algorithms to achieve the main goal.
30ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
Limitation
• If the partial sums for an algorithm is more
than 8 bits for a reference block cannot be
put, accessed, and manipulated in a
contiguous memory space, since there are
partial sums of other reference blocks lying in
between; due to this, a large number of CPU
cycles are lost in manipulating these data. As a
consequence, these algorithms are not
suitable for SIMD implementations.
31ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
Procedure
• Devise a scheme that uses only 8 bit partial
sum and discard as many SAD computations
as possible, without excluding the optimal
motion vector.
– The proposed partial sums can not only be utilized
in the full-search algorithm as well as in some of
the fast block motion-estimation algorithms.
• Devise a scheme that generalises the previous
scheme to multi-level case and optimally
utilise it.
32ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
Partial Sums
33
268
+ 483
600Add the hundreds (200 + 400)
Add the tens (60 +80) 140
Add the ones (8 + 3)
Add the partial sums
(600 + 140 + 11)
+ 11
751
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
8 Bit Partial Sums- Objective
• The objective of this paper is to find new
partial sums of only eight bits, so that they
can be of the packed byte-type on an SIMD
architecture.
• In this way, eight additions or subtractions, for
the partial sums can be executed in one SIMD
instruction
34ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
8-bit Partial Sums
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16 X 16
35
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
∑(n)
Lower Bound
36ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
using
Scheme One- Algorithm
37ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
• Step 1) Initialization
a) Compute all of the 8-bit partial sums of
sixteen luminance values for the current
frame and save them in a contiguous
memory space.
b) Retrieve all the 8-bit partial sums of sixteen
luminance values for the reference frame in a
saved contiguous memory
Scheme One- Algorithm
(ctd)
• Step 2) For every current block, execute the block
motion-estimation process.
– Step 2.1) Initialization
38ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
Scheme One- Algorithm
(ctd)
– Step 2.2) Search
• For (each search location of in a motion-
estimation algorithm)
39ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
40
Scheme One- Flow Chart
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
Multilevel 8-bit Partial Sums
16 X 16
41
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
Multi-level Visualisation
Multi-level Visualisation
Multi-level Visualisation (ctd)
Multi-level Visualisation (ctd)
Multi-level Visualisation (ctd)
Multi-level Visualisation (ctd)
Multi-level Visualisation (ctd
Partial Sum Pyramid
Partial Sum Pyramid
8 x 16
4 x 16
2 x 16
1 x 16
Level 1 Level 2 Level 3 Level 4
49
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
50
Multilevel 8-bit Partial Sums- Upper
Bound (UB)
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
.
Scheme Two Algorithm
51ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
• Step 1) Initialization
a) Compute all of the 8-bit partial sums of levels
one and four for the current frame and save
them in a contiguous memory space.
b) Retrieve all of the 8-bit partial sums of levels
one and four for the reference frame in a
saved contiguous memory space.
Scheme Two Algorithm (ctd)
52ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
• Step 2) For every current block, execute the block
motion-estimation process.
– Step 2.1) Initialization
Scheme Two Algorithm (ctd)
53
– Step 2.2) Search
• For (each search location of in a motion-
estimation algorithm)
Scheme Two- Flow Chart
54
Possible Conditions
55
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
Condition 1:
Condition 2:
Condition 3:
Condition 4:
Possible Combinations
56ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
AVERAGEEXECUTION TIME(INMILLISECONDS)PERFRAME FORVARIOUSMETHODS
Results
57ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
Possible Combinations
58ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
SIMD
59ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
COMPUTATIONAL COMPLEXITY AND AVERAGE
NUMBER OF CPU CYCLES PER BLOCK USING FSA
60ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
COMPUTATIONAL COMPLEXITY AND AVERAGE
NUMBER OF CPU CYCLES PER BLOCK USING SEA
61ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
COMPUTATIONAL COMPLEXITY AND AVERAGE
NUMBER OF CPU CYCLES PER BLOCK USING 3SSA
62
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
COMPUTATIONAL COMPLEXITY ANDAVERAG
ENUMBER OF CPU CYCLES PER BLOCK USING 4SSA
63
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
COMPUTATIONAL COMPLEXITY AND AVERAGE
NUMBER OF CPU CYCLES PER BLOCK USING UDSA
64ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
COMPUTATIONAL COMPLEXITY AND AVERAGE
NUMBER OF CPU CYCLES PER BLOCK USING HBSA
65ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
THE PERCENTAGE OF SPEEDUP OFFERED BY SIMD IMPLEMENTATION FOR
A MOTION ESTIMATION ALGORITHM WITH SCHEME 2 INCORPORATED
66ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
Conclusion
Introduced a new technique of 8 bit partial
sum.
The partial sums were used to make best use
of SIMD architecture, and hence improving
the speed of motion estimation algorithm.
Since these partial sums have the
characteristic of having only 8 bits, eight of
them can be processed concurrently using a
single 64-bit SIMD register.
67ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
Conclusion
 The notion of the 8-bit partial sums has then been
extended to the four-level case and shown that there are
15 possible methods of utilizing these multilevel partial
sums to accelerate the block motion-estimation algorithms
without any loss of accuracy.
 The full-search algorithm has then been used to determine
as to which one of these 15 methods would provide the
lowest computational complexity in order for it to be
chosen to accelerate various motion-estimation algorithms.
68ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
Conclusion
 Extensive simulations have been carried out to find
the average number of CPU cycles needed per block for
various algorithms incorporating the chosen method.
 These simulations have shown that the proposed
scheme is capable of providing a substantial speed-up
for the various existing motion-estimation algorithms
through the reduction of their computational
complexities.
 The simulation results also demonstrate that the
implementation on an SIMD architecture can further
accelerate the proposed scheme by more than 93%.
69ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
70
References
1. “FPGA Implementation of a Novel, Fast Motion Estimation Algorithm for Real-Time Video
Compression”, FPGA 2001, CA. USA, S. Ramachandran and S. Srinivasan, Feb. 2001
2. “Image & Video Compression for Multimedia Engineering”, Y.Q. Shi and H. Sun, 2000
3. “A New Diamond Search Algorithm for Fast Block-Matching Motion Estimation”, IEEE Trans. Image
Processing, S. Zhu and K. K. Ma, Feb. 2000
4. “A Novel Four-Step Search Algorithm for Fast Block Motion Estimation”, IEEE Trans. Circuits System,
Video Technology, L. M. Po and W. C. Ma, June 1996
5. “Successive Elimination Algorithm for Motion Estimation” W. Li and E. Salari IEEE Trans. , Jan. 1995
6. “A New Three-Step Search Algorithm for Block Motion Estimation”, IEEE Trans. Circuits System,
Video Technology, R. Li, B. Zeng, and M.L. Liou, Aug. 1994
7. “Predictive Coding Based on Efficient Motion Estimation”, IEEE Trans. on communications, R.
Srinivasan, K.R. Rao, Aug. 1985
8. “Motion Compensated Inter-Frame Coding for Video-Conferencing”, T. Koga, K. Iinuma, A. Hirano,
Y. Iijima, and T. Ishiguro, Proc. NTC81, Nov. 1981
9. “Displacement Measurement and its Applications”, IEEE Trans. on communications, J.R. Jain and
A.K Jain, Dec. 1981
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
71ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Contenu connexe

Tendances

Using GPUs for Collision detection, Recent Advances in Real-Time Collision an...
Using GPUs for Collision detection, Recent Advances in Real-Time Collision an...Using GPUs for Collision detection, Recent Advances in Real-Time Collision an...
Using GPUs for Collision detection, Recent Advances in Real-Time Collision an...Takahiro Harada
 
FastCampus 2018 SLAM Workshop
FastCampus 2018 SLAM WorkshopFastCampus 2018 SLAM Workshop
FastCampus 2018 SLAM WorkshopDong-Won Shin
 
Innovative Solar Array Drive Assembly for CubeSat Satellite
Innovative Solar Array Drive Assembly for CubeSat SatelliteInnovative Solar Array Drive Assembly for CubeSat Satellite
Innovative Solar Array Drive Assembly for CubeSat SatelliteMichele Marino
 
Get more from your UAV Imagery
Get more from your UAV ImageryGet more from your UAV Imagery
Get more from your UAV Imagerypcigeomatics
 
Trajectory generation for Servo motor drives
Trajectory generation for Servo motor drivesTrajectory generation for Servo motor drives
Trajectory generation for Servo motor drivescontroltrix
 
Tech Days 2015: User Presentation Vermont Technical College
Tech Days 2015: User Presentation Vermont Technical CollegeTech Days 2015: User Presentation Vermont Technical College
Tech Days 2015: User Presentation Vermont Technical CollegeAdaCore
 
Analysis of KinectFusion
Analysis of KinectFusionAnalysis of KinectFusion
Analysis of KinectFusionDong-Won Shin
 

Tendances (8)

Using GPUs for Collision detection, Recent Advances in Real-Time Collision an...
Using GPUs for Collision detection, Recent Advances in Real-Time Collision an...Using GPUs for Collision detection, Recent Advances in Real-Time Collision an...
Using GPUs for Collision detection, Recent Advances in Real-Time Collision an...
 
FastCampus 2018 SLAM Workshop
FastCampus 2018 SLAM WorkshopFastCampus 2018 SLAM Workshop
FastCampus 2018 SLAM Workshop
 
Innovative Solar Array Drive Assembly for CubeSat Satellite
Innovative Solar Array Drive Assembly for CubeSat SatelliteInnovative Solar Array Drive Assembly for CubeSat Satellite
Innovative Solar Array Drive Assembly for CubeSat Satellite
 
Kintinuous review
Kintinuous reviewKintinuous review
Kintinuous review
 
Get more from your UAV Imagery
Get more from your UAV ImageryGet more from your UAV Imagery
Get more from your UAV Imagery
 
Trajectory generation for Servo motor drives
Trajectory generation for Servo motor drivesTrajectory generation for Servo motor drives
Trajectory generation for Servo motor drives
 
Tech Days 2015: User Presentation Vermont Technical College
Tech Days 2015: User Presentation Vermont Technical CollegeTech Days 2015: User Presentation Vermont Technical College
Tech Days 2015: User Presentation Vermont Technical College
 
Analysis of KinectFusion
Analysis of KinectFusionAnalysis of KinectFusion
Analysis of KinectFusion
 

Similaire à Fast block motion estimation with 8 bit partial sums using SIMD architecture

IRJET-Hardware Co-Simulation of Classical Edge Detection Algorithms using Xil...
IRJET-Hardware Co-Simulation of Classical Edge Detection Algorithms using Xil...IRJET-Hardware Co-Simulation of Classical Edge Detection Algorithms using Xil...
IRJET-Hardware Co-Simulation of Classical Edge Detection Algorithms using Xil...IRJET Journal
 
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGAEFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGAVLSICS Design
 
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGAEFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGAVLSICS Design
 
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGAEFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGAVLSICS Design
 
Vlsi implimentation of a cost efficient near-lossless cfa image compression f...
Vlsi implimentation of a cost efficient near-lossless cfa image compression f...Vlsi implimentation of a cost efficient near-lossless cfa image compression f...
Vlsi implimentation of a cost efficient near-lossless cfa image compression f...Shafeek Basheer
 
FPGA Implementation of High Speed AMBA Bus Architecture for Image Transmissio...
FPGA Implementation of High Speed AMBA Bus Architecture for Image Transmissio...FPGA Implementation of High Speed AMBA Bus Architecture for Image Transmissio...
FPGA Implementation of High Speed AMBA Bus Architecture for Image Transmissio...IRJET Journal
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsScott Clark
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsSigOpt
 
IRJET- Storage Optimization of Video Surveillance from CCTV Camera
IRJET- Storage Optimization of Video Surveillance from CCTV CameraIRJET- Storage Optimization of Video Surveillance from CCTV Camera
IRJET- Storage Optimization of Video Surveillance from CCTV CameraIRJET Journal
 
IRJET - Traffic Density Estimation by Counting Vehicles using Aggregate Chann...
IRJET - Traffic Density Estimation by Counting Vehicles using Aggregate Chann...IRJET - Traffic Density Estimation by Counting Vehicles using Aggregate Chann...
IRJET - Traffic Density Estimation by Counting Vehicles using Aggregate Chann...IRJET Journal
 
AUTOMATIC NUMBERPLATE RECOGNITION
AUTOMATIC NUMBERPLATE RECOGNITIONAUTOMATIC NUMBERPLATE RECOGNITION
AUTOMATIC NUMBERPLATE RECOGNITIONIRJET Journal
 
Vizup 3D Optimization for Reality Capture (company presentation and recent us...
Vizup 3D Optimization for Reality Capture (company presentation and recent us...Vizup 3D Optimization for Reality Capture (company presentation and recent us...
Vizup 3D Optimization for Reality Capture (company presentation and recent us...vizup
 
Point cloud mesh-investigation_report-lihang
Point cloud mesh-investigation_report-lihangPoint cloud mesh-investigation_report-lihang
Point cloud mesh-investigation_report-lihangLihang Li
 
HARDWARE SOFTWARE CO-SIMULATION FOR TRAFFIC LOAD COMPUTATION USING MATLAB SIM...
HARDWARE SOFTWARE CO-SIMULATION FOR TRAFFIC LOAD COMPUTATION USING MATLAB SIM...HARDWARE SOFTWARE CO-SIMULATION FOR TRAFFIC LOAD COMPUTATION USING MATLAB SIM...
HARDWARE SOFTWARE CO-SIMULATION FOR TRAFFIC LOAD COMPUTATION USING MATLAB SIM...ijcsity
 
Hardware software co simulation of edge detection for image processing system...
Hardware software co simulation of edge detection for image processing system...Hardware software co simulation of edge detection for image processing system...
Hardware software co simulation of edge detection for image processing system...eSAT Publishing House
 
Video Stitching using Improved RANSAC and SIFT
Video Stitching using Improved RANSAC and SIFTVideo Stitching using Improved RANSAC and SIFT
Video Stitching using Improved RANSAC and SIFTIRJET Journal
 
COMPOSITE IMAGELET IDENTIFIER FOR ML PROCESSORS
COMPOSITE IMAGELET IDENTIFIER FOR ML PROCESSORSCOMPOSITE IMAGELET IDENTIFIER FOR ML PROCESSORS
COMPOSITE IMAGELET IDENTIFIER FOR ML PROCESSORSIRJET Journal
 

Similaire à Fast block motion estimation with 8 bit partial sums using SIMD architecture (20)

IRJET-Hardware Co-Simulation of Classical Edge Detection Algorithms using Xil...
IRJET-Hardware Co-Simulation of Classical Edge Detection Algorithms using Xil...IRJET-Hardware Co-Simulation of Classical Edge Detection Algorithms using Xil...
IRJET-Hardware Co-Simulation of Classical Edge Detection Algorithms using Xil...
 
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGAEFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
 
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGAEFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
 
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGAEFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
 
Vlsi implimentation of a cost efficient near-lossless cfa image compression f...
Vlsi implimentation of a cost efficient near-lossless cfa image compression f...Vlsi implimentation of a cost efficient near-lossless cfa image compression f...
Vlsi implimentation of a cost efficient near-lossless cfa image compression f...
 
FPGA Implementation of High Speed AMBA Bus Architecture for Image Transmissio...
FPGA Implementation of High Speed AMBA Bus Architecture for Image Transmissio...FPGA Implementation of High Speed AMBA Bus Architecture for Image Transmissio...
FPGA Implementation of High Speed AMBA Bus Architecture for Image Transmissio...
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
IRJET- Storage Optimization of Video Surveillance from CCTV Camera
IRJET- Storage Optimization of Video Surveillance from CCTV CameraIRJET- Storage Optimization of Video Surveillance from CCTV Camera
IRJET- Storage Optimization of Video Surveillance from CCTV Camera
 
IRJET - Traffic Density Estimation by Counting Vehicles using Aggregate Chann...
IRJET - Traffic Density Estimation by Counting Vehicles using Aggregate Chann...IRJET - Traffic Density Estimation by Counting Vehicles using Aggregate Chann...
IRJET - Traffic Density Estimation by Counting Vehicles using Aggregate Chann...
 
Fianl_Paper
Fianl_PaperFianl_Paper
Fianl_Paper
 
AUTOMATIC NUMBERPLATE RECOGNITION
AUTOMATIC NUMBERPLATE RECOGNITIONAUTOMATIC NUMBERPLATE RECOGNITION
AUTOMATIC NUMBERPLATE RECOGNITION
 
Spark Technology Center IBM
Spark Technology Center IBMSpark Technology Center IBM
Spark Technology Center IBM
 
Vizup 3D Optimization for Reality Capture (company presentation and recent us...
Vizup 3D Optimization for Reality Capture (company presentation and recent us...Vizup 3D Optimization for Reality Capture (company presentation and recent us...
Vizup 3D Optimization for Reality Capture (company presentation and recent us...
 
Point cloud mesh-investigation_report-lihang
Point cloud mesh-investigation_report-lihangPoint cloud mesh-investigation_report-lihang
Point cloud mesh-investigation_report-lihang
 
HARDWARE SOFTWARE CO-SIMULATION FOR TRAFFIC LOAD COMPUTATION USING MATLAB SIM...
HARDWARE SOFTWARE CO-SIMULATION FOR TRAFFIC LOAD COMPUTATION USING MATLAB SIM...HARDWARE SOFTWARE CO-SIMULATION FOR TRAFFIC LOAD COMPUTATION USING MATLAB SIM...
HARDWARE SOFTWARE CO-SIMULATION FOR TRAFFIC LOAD COMPUTATION USING MATLAB SIM...
 
Hardware software co simulation of edge detection for image processing system...
Hardware software co simulation of edge detection for image processing system...Hardware software co simulation of edge detection for image processing system...
Hardware software co simulation of edge detection for image processing system...
 
Video Stitching using Improved RANSAC and SIFT
Video Stitching using Improved RANSAC and SIFTVideo Stitching using Improved RANSAC and SIFT
Video Stitching using Improved RANSAC and SIFT
 
Cuda project paper
Cuda project paperCuda project paper
Cuda project paper
 
COMPOSITE IMAGELET IDENTIFIER FOR ML PROCESSORS
COMPOSITE IMAGELET IDENTIFIER FOR ML PROCESSORSCOMPOSITE IMAGELET IDENTIFIER FOR ML PROCESSORS
COMPOSITE IMAGELET IDENTIFIER FOR ML PROCESSORS
 

Plus de ahmad abdelhafeez

Surveying cross layer protocols in ws ns
Surveying cross layer protocols in ws nsSurveying cross layer protocols in ws ns
Surveying cross layer protocols in ws nsahmad abdelhafeez
 
Energy harvesting sensor nodes
Energy harvesting sensor nodes   Energy harvesting sensor nodes
Energy harvesting sensor nodes ahmad abdelhafeez
 
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...ahmad abdelhafeez
 
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...ahmad abdelhafeez
 
Energy conservation in wireless sensor networks
Energy conservation in wireless sensor networksEnergy conservation in wireless sensor networks
Energy conservation in wireless sensor networksahmad abdelhafeez
 
Sdn pres v2-Software-defined networks
Sdn pres v2-Software-defined networksSdn pres v2-Software-defined networks
Sdn pres v2-Software-defined networksahmad abdelhafeez
 
Digital forensics ahmed emam
Digital forensics   ahmed emamDigital forensics   ahmed emam
Digital forensics ahmed emamahmad abdelhafeez
 
Malewareanalysis presentation
Malewareanalysis presentationMalewareanalysis presentation
Malewareanalysis presentationahmad abdelhafeez
 

Plus de ahmad abdelhafeez (20)

Surveying cross layer protocols in ws ns
Surveying cross layer protocols in ws nsSurveying cross layer protocols in ws ns
Surveying cross layer protocols in ws ns
 
Service level management
Service level managementService level management
Service level management
 
Energy harvesting sensor nodes
Energy harvesting sensor nodes   Energy harvesting sensor nodes
Energy harvesting sensor nodes
 
V5I3_IJERTV5IS031157
V5I3_IJERTV5IS031157V5I3_IJERTV5IS031157
V5I3_IJERTV5IS031157
 
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
 
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
 
Energy conservation in wireless sensor networks
Energy conservation in wireless sensor networksEnergy conservation in wireless sensor networks
Energy conservation in wireless sensor networks
 
Localization in wsn
Localization in wsnLocalization in wsn
Localization in wsn
 
Routing
RoutingRouting
Routing
 
Wsn security issues
Wsn security issuesWsn security issues
Wsn security issues
 
Trusted systems
Trusted systemsTrusted systems
Trusted systems
 
opnet
opnetopnet
opnet
 
Wsn security issues
Wsn security issuesWsn security issues
Wsn security issues
 
Sdn pres v2-Software-defined networks
Sdn pres v2-Software-defined networksSdn pres v2-Software-defined networks
Sdn pres v2-Software-defined networks
 
Intrusion prevension
Intrusion prevensionIntrusion prevension
Intrusion prevension
 
Digital forensics ahmed emam
Digital forensics   ahmed emamDigital forensics   ahmed emam
Digital forensics ahmed emam
 
Digital forensics.abdallah
Digital forensics.abdallahDigital forensics.abdallah
Digital forensics.abdallah
 
Cloud computing final show
Cloud computing final   showCloud computing final   show
Cloud computing final show
 
Incident handling.final
Incident handling.finalIncident handling.final
Incident handling.final
 
Malewareanalysis presentation
Malewareanalysis presentationMalewareanalysis presentation
Malewareanalysis presentation
 

Dernier

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Dernier (20)

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

Fast block motion estimation with 8 bit partial sums using SIMD architecture

  • 1. Fast Block Motion Estimation With 8- Bit Partial Sums Using SIMD Architectures Presented by: •Ahmed Abdel-Hafeez •Ahmed El-Bohy •Ahmed Emam •Ahmed Kandil Supervised by/Presented to: Pf.Dr. Attalah Hashaad Published by: Chunjiang J. Duanmu et. al. Published in August 2007.
  • 2. Outline • Abstract. • Introduction. • 8-bit partial sums. • Multilevel 8-bit partial sums. • Computational complexity. • Simulation Results. • Conclusion. 2ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 3. Abstract • Fast block motion estimation algorithms are needed for real-time implementations of video coding standards due to the high computational complexity of the full-search algorithm for block motion estimation. • In this paper, an algorithm using 8-bit partial sums of 16 luminance values for a fast block motion estimation is proposed. The technique of using the partial sums is employed to reduce the computational complexity of not only the full search algorithm but also some of the fast block motion estimation algorithms while maintaining their accuracy. • Furthermore, it is shown that the byte-type data-parallelism on an SIMD architecture can be utilized to access and process these partial sums concurrently to accelerate the process of motion estimation. • Simulation results are presented to demonstrate that the use of the partial sums can accelerate the execution of the full-search and another search algorithms on an SIMD architecture significantly. 3ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 4. 4 Introduction- - Applications ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide Basics
  • 5. Chronological Table of Video Coding Standards The objective of video coding is to compress moving images H.261 (1990) MPEG-1 (1993) H.263 (1995/96) H.263+ (1997/98) H.263++ (2000) H.264 ( MPEG-4 Part 10 ) (2002) MPEG-4 v1 (1998/99) MPEG-4 v2 (1999/00) MPEG-4 v3 (2001) 1990 1992 1994 1996 1998 2000 2002 2003 MPEG-2 (H.262) (1994/95) ISO/IEC MPEG ITU-T VCEG 5
  • 6. Introduction-Basics- Video 6 Frame 1 Frame 2 Frame 3 Frame 4 Luminance (Y) : Describes the brightness of the pixel. Chrominance (CbCr) : Describes the color of the pixel. Frame ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 7. Introduction-Basics- Video Data Drawback • An uncompressed video data is big in size. – This is due to data redundancy, there are two general types of data redundancy in a video: 7 Spatial redundancy In a frame, adjacent pixels are usually correlated. e.g. - The grass is green in the background of a frame. Frame 1 Frame 2 Frame 3 Frame 4 Time based redundancy In a video, adjacent frames are usually correlated. e.g. - The green background is persisting frame after frame. ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 8. • Predict current frame based on previously coded frames • Types of coded frames: – I-frame – Intra-coded frame, coded independently of all other frames – P-frame – Predictively coded frame, coded based on previously coded frame – B-frame – Bi-directionally predicted frame, coded based on both previous and future coded frames Introduction-Basics- Video Compression 8 ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 9. Block Matching 9ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 10. • What is Motion Estimation? – Predict current frame from previous frame – Determine the displacement of an object in the video sequence – The amount of data to be coded can be reduced significantly if the previous frame is subtracted from the current frame. 10 Motion Estimation ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 11. Block Based Motion Estimation Algorithms Time-domain Algorithms Frequency-domain Algorithms Matching Algorithms Gradient Based Algorithms Block-Matching Feature- matching Pel-recursive Block-recursive Phase- correlation (DFT) Matching in (DCT) domain Matching in wavelet domain Mesh Based Motion Estimation Algorithms Motion Estimation Classification 11ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 12. Motion Estimation (ctd) 12ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 13. Motion Estimation (ctd) 13ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 14. 14 Motion Estimation (ctd) Reference Frame Current Frame Current 16x16 Block Search Window Sum of Absolute Difference (SAD) ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 15. • CCF(Cross-Correlation Function) • MSE(Mean Square Error Function) • MAE(Mean Absolute Error) • SAD(Sum of Absolute Difference) • PDC(Pixel Difference Classification) • MAE(or MAD,SAD are commonly employed due to their simplicity in hardware implementation) Distortion Criterion for measuring distance between previous block and search area block 15ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 16. SAD(dx,dy) = (MVx, MVy) = min (dx,dy)ЄR2 SAD(dx,dy) 1 1 1 |),(),(| Nx xm Ny yn kk dyndxmInmI SAD 16ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 17. Search Algorithms 17 Search Algorithms FAST MULTISTEP 3SS 4SS HBS UDS EXHAUSTIVE SE MSE VF PFGSE FULL ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 18. Search Algorithms (ctd) • There is a trade-off between the run time and the accuracy. • Full search will be most accurate because of exhaustive search, but will require more time • Fast search is faster but the accuracy will be reduced because of estimation algorithms. 18ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 19. Full-Search 19ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide not suitable for real time.
  • 20. •Simplest algorithm, but computationally most expensive 20 Exhaustive Search ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 21. Three Step Search (3SSA) 21ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 22. Three Step Search (3SSA) (ctd) 22ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 23. Three Step Search (3SSA) (ctd) 23ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 24. Three Step Search (3SSA) (ctd) 24ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 25. 25 3SSA Block Matching ►Three-Step Search (3SS) – 9 Points: Central point & its 8 surroundings – Distance: w/2 – Find the best match – Use previous best as center – Half distance, select 8 new – Repeat algorithm 3 times – Examines 25 points – Assumes a uniform distribution of MV’s 1 1 11 11 1 1 1 2 3 2 2 222 2 2 3 3 3 3 3 3 3 ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 26. 4SSA 26ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 27. Unrestricted center-bitiased Diamond Search Algorithm (UDSA) 27ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 28. Hexagon-Bitased search algorithm (HBSA) 28ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 29. Problem Definition • The high computational requirement of the Full Search (FS) algorithm does not allow it to work in real time applications, despite its high accuracy. • Fast Block motion estimation algorithms have lower computational complexity, but lower accuracy. • Since, fast block motion estimation are chosen for real time applications  Hence in this paper too. 29ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 30. Aim • To improve the accuracy of some of the fast block motion estimation techniques without increasing the computational complexity. • To make best use of Single Instruction Multiple Data (SIMD) architecture and to take advantage of byte-type data-parallelism to further accelerate the execution of the algorithms to achieve the main goal. 30ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 31. Limitation • If the partial sums for an algorithm is more than 8 bits for a reference block cannot be put, accessed, and manipulated in a contiguous memory space, since there are partial sums of other reference blocks lying in between; due to this, a large number of CPU cycles are lost in manipulating these data. As a consequence, these algorithms are not suitable for SIMD implementations. 31ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 32. Procedure • Devise a scheme that uses only 8 bit partial sum and discard as many SAD computations as possible, without excluding the optimal motion vector. – The proposed partial sums can not only be utilized in the full-search algorithm as well as in some of the fast block motion-estimation algorithms. • Devise a scheme that generalises the previous scheme to multi-level case and optimally utilise it. 32ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 33. Partial Sums 33 268 + 483 600Add the hundreds (200 + 400) Add the tens (60 +80) 140 Add the ones (8 + 3) Add the partial sums (600 + 140 + 11) + 11 751 ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 34. 8 Bit Partial Sums- Objective • The objective of this paper is to find new partial sums of only eight bits, so that they can be of the packed byte-type on an SIMD architecture. • In this way, eight additions or subtractions, for the partial sums can be executed in one SIMD instruction 34ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 35. 8-bit Partial Sums 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 X 16 35 ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide ∑(n)
  • 36. Lower Bound 36ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide using
  • 37. Scheme One- Algorithm 37ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide • Step 1) Initialization a) Compute all of the 8-bit partial sums of sixteen luminance values for the current frame and save them in a contiguous memory space. b) Retrieve all the 8-bit partial sums of sixteen luminance values for the reference frame in a saved contiguous memory
  • 38. Scheme One- Algorithm (ctd) • Step 2) For every current block, execute the block motion-estimation process. – Step 2.1) Initialization 38ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 39. Scheme One- Algorithm (ctd) – Step 2.2) Search • For (each search location of in a motion- estimation algorithm) 39ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 40. 40 Scheme One- Flow Chart ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 41. Multilevel 8-bit Partial Sums 16 X 16 41 ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 49. Partial Sum Pyramid Partial Sum Pyramid 8 x 16 4 x 16 2 x 16 1 x 16 Level 1 Level 2 Level 3 Level 4 49 ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 50. 50 Multilevel 8-bit Partial Sums- Upper Bound (UB) ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide .
  • 51. Scheme Two Algorithm 51ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide • Step 1) Initialization a) Compute all of the 8-bit partial sums of levels one and four for the current frame and save them in a contiguous memory space. b) Retrieve all of the 8-bit partial sums of levels one and four for the reference frame in a saved contiguous memory space.
  • 52. Scheme Two Algorithm (ctd) 52ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide • Step 2) For every current block, execute the block motion-estimation process. – Step 2.1) Initialization
  • 53. Scheme Two Algorithm (ctd) 53 – Step 2.2) Search • For (each search location of in a motion- estimation algorithm)
  • 54. Scheme Two- Flow Chart 54
  • 55. Possible Conditions 55 ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide Condition 1: Condition 2: Condition 3: Condition 4:
  • 56. Possible Combinations 56ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 57. AVERAGEEXECUTION TIME(INMILLISECONDS)PERFRAME FORVARIOUSMETHODS Results 57ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 58. Possible Combinations 58ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 59. SIMD 59ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 60. COMPUTATIONAL COMPLEXITY AND AVERAGE NUMBER OF CPU CYCLES PER BLOCK USING FSA 60ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 61. COMPUTATIONAL COMPLEXITY AND AVERAGE NUMBER OF CPU CYCLES PER BLOCK USING SEA 61ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 62. COMPUTATIONAL COMPLEXITY AND AVERAGE NUMBER OF CPU CYCLES PER BLOCK USING 3SSA 62 ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 63. COMPUTATIONAL COMPLEXITY ANDAVERAG ENUMBER OF CPU CYCLES PER BLOCK USING 4SSA 63 ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 64. COMPUTATIONAL COMPLEXITY AND AVERAGE NUMBER OF CPU CYCLES PER BLOCK USING UDSA 64ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 65. COMPUTATIONAL COMPLEXITY AND AVERAGE NUMBER OF CPU CYCLES PER BLOCK USING HBSA 65ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 66. THE PERCENTAGE OF SPEEDUP OFFERED BY SIMD IMPLEMENTATION FOR A MOTION ESTIMATION ALGORITHM WITH SCHEME 2 INCORPORATED 66ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 67. Conclusion Introduced a new technique of 8 bit partial sum. The partial sums were used to make best use of SIMD architecture, and hence improving the speed of motion estimation algorithm. Since these partial sums have the characteristic of having only 8 bits, eight of them can be processed concurrently using a single 64-bit SIMD register. 67ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 68. Conclusion  The notion of the 8-bit partial sums has then been extended to the four-level case and shown that there are 15 possible methods of utilizing these multilevel partial sums to accelerate the block motion-estimation algorithms without any loss of accuracy.  The full-search algorithm has then been used to determine as to which one of these 15 methods would provide the lowest computational complexity in order for it to be chosen to accelerate various motion-estimation algorithms. 68ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 69. Conclusion  Extensive simulations have been carried out to find the average number of CPU cycles needed per block for various algorithms incorporating the chosen method.  These simulations have shown that the proposed scheme is capable of providing a substantial speed-up for the various existing motion-estimation algorithms through the reduction of their computational complexities.  The simulation results also demonstrate that the implementation on an SIMD architecture can further accelerate the proposed scheme by more than 93%. 69ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 70. 70 References 1. “FPGA Implementation of a Novel, Fast Motion Estimation Algorithm for Real-Time Video Compression”, FPGA 2001, CA. USA, S. Ramachandran and S. Srinivasan, Feb. 2001 2. “Image & Video Compression for Multimedia Engineering”, Y.Q. Shi and H. Sun, 2000 3. “A New Diamond Search Algorithm for Fast Block-Matching Motion Estimation”, IEEE Trans. Image Processing, S. Zhu and K. K. Ma, Feb. 2000 4. “A Novel Four-Step Search Algorithm for Fast Block Motion Estimation”, IEEE Trans. Circuits System, Video Technology, L. M. Po and W. C. Ma, June 1996 5. “Successive Elimination Algorithm for Motion Estimation” W. Li and E. Salari IEEE Trans. , Jan. 1995 6. “A New Three-Step Search Algorithm for Block Motion Estimation”, IEEE Trans. Circuits System, Video Technology, R. Li, B. Zeng, and M.L. Liou, Aug. 1994 7. “Predictive Coding Based on Efficient Motion Estimation”, IEEE Trans. on communications, R. Srinivasan, K.R. Rao, Aug. 1985 8. “Motion Compensated Inter-Frame Coding for Video-Conferencing”, T. Koga, K. Iinuma, A. Hirano, Y. Iijima, and T. Ishiguro, Proc. NTC81, Nov. 1981 9. “Displacement Measurement and its Applications”, IEEE Trans. on communications, J.R. Jain and A.K Jain, Dec. 1981 ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
  • 71. 71ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide