SlideShare une entreprise Scribd logo
1  sur  33
RaVioli: A Parallel Video Processing Librarywith Auto Resolution Adjustability Hiroko SAKURAI†Masaomi OHNO†Shintaro OKADA‡ Tomoaki TSUMURA†        Hiroshi MATSUO† † Nagoya Institute of Technology, Japan ‡ Toyota Motor Corp., Japan IADIS International Conference APPLIED COMPUTING 2009 November 19 – 21, 2009 Rome, Italy
Background(1/2): Portability of Video Applications Real-time video processing applications should run on a great variety of platforms Cell phones Cars PCs Principal goal of an application Long battery life High throughput Good accuracy Applied Computing 2009 2 We must rewrite a video processing program, when porting it to another platform
Background(2/2): Many-Core Era is Coming Multi/Many-core processors have come into wide use  Video processing applications have various parallelisms Pixels in video frames have data parallelism Multiple frames can be processed in parallel by pipelining promise good performance on such parallel systems Applied Computing 2009 3 Parallelizing programs is not so simple It becomes much important to improve compilers and libraries
A Video Processing Library: RaVioli RaVioli provides: Easy writeability of pseudo real-time video processing Interfaces for parallelization Detecting data dependencies and formulating reductions Balancing loadsof pipeline stages Applied Computing 2009 4
Outline Concept of RaVioli RaVioli hides resolutions from programmers Easy writeability of video processing applications Pseudo real-time processing by adjusting loads Semi-automatic parallelization functions Automatic block decomposition Pipelining interface with automatic load balance mechanism Evaluation results Applied Computing 2009 5
Traditional Image Processing Program Image processing program written by traditional C Applied Computing 2009 6 InImg void main{  // Input image intluma; for(int y=0;y<180;y++){  for(int x=0;x<200;x++){ luma = (int)( InImg[x][y].R*0.299              +InImg[x][y].G*0.587              +InImg[x][y].B*0.114);   OutImg[x][y].R = luma; OutImg[x][y].G = luma; OutImg[x][y].B = luma;   }  } } OutImg
Image Processing Program with RaVioli Grayscale program using RaVioli Applied Computing 2009 7 RV_ImageInImg Component function RV_PixelGrayScale(RV_Pixel Pix){  intluma;  luma=(int)(         Pix.R()*0.299        +Pix.G()*0.587        +Pix.B()*0.114);  return(Pix.setRGB(luma, luma, luma)); } void main(){ RV_ImageInImg,OutImg;  // Input image OutImg=InImg.procPix(GrayScale); } Higher-oder method procPix RV_ImageOutImg
Video Processing Program with RaVioli Video processing program with RaVioli Applied Computing 2009 8 RV_Imageobj RV_PixelGrayScale(RV_Pixelp){ } Higher-oder method Grayscale RV_ImageGrayScale(RV_Imageimg){ } RV_Imageobj RV_Videoobj Higher-oder method
Outline Concept of RaVioli RaVioli hides resolutions from programmers Easy writeability of video processing applications Pseudo real-time processing by adjusting loads Semi-automatic parallelization functions Automatic block decomposition Pipelining interface with automatic load balance mechanism Evaluation results Applied Computing 2009 9
Auto-Adjustment of Computation Load Spatial resolution (pixel rate) Ss: Spatial stride Temporal resolution (frame rate) St: Temporal stride Applied Computing 2009 10 1/4 Ss=1 Ss=2 1/2 St=1 St=2
Priority Set Which stride should be increased? (Spatial resolution, Temporal resolution)= (7,3) : keep spatial stride and temporal stride in the ratio of “3:7” (1,0) : keep spatial stride “1” Applied Computing 2009 11 Moving object detection Temporal resolution Pattern recognition Spatial resolution We can specify resolution priorities by priority set St=1 St=2 Ss=1 Ss=2
Detecting Overload Applied Computing 2009 12 RV_Video class Frame interval Higher-oder method Overloaded! < Ring buffer Processing time RV_Image instance Image Processing program Higher-order method
Outline Concept of RaVioli RaVioli hides resolutions from programmers Easy writeability of video processing applications Pseudo real-time processing by adjusting loads Semi-automatic parallelization functions Automatic block decomposition Pipelining interface with automatic load balance mechanism Evaluation results of our work Applied Computing 2009 13
Parallelization: Block Decomposition Image processing with c/c++ Image processing with RaVioli RV_PixGrayScale(RV_PixPix){ intY;    Y = (int)( Pix.R()*0.299 					+Pix.G()*0.587 					+Pix.B()*0.114); return( Pix.setRGB(Y, Y, Y) ); } void main(){ RV_ImgInImg,  OutImg; OutImg = InImg.procPix(GrayScale); } void main(){ byte InImg[180][200]; byte OutImg[180][200]; for( inty=0; y<180; y++ ){ for( intx=0; x<200; x++ ){ OutImg[x][y]=(int)( InImg[x][y].R*0.299         +InImg[x][y].G*0.587         +InImg[x][y].B*0.114); } } }
Parallelization: Block Decomposition Image processing with RaVioli RV_PixGrayScale(RV_PixPix){ intY; Y = (int)( Pix.R()*0.299 					+Pix.G()*0.587 					+Pix.B()*0.114); return( Pix.setRGB(Y, Y, Y) ); } voidmain(){ RV_ImgInImg,OutImg; OutImg = InImg.procPix(GrayScale); } thread1 thread2 thread4 thread3 OutImg = InImg.procPix(GrayScale, 4); InImg
Translator for Block Decomposition Reduction operations may be required Applied Computing 2009 16 Translator RV_PixGrayScale(RV_PixPix){ intY; Y = (int)( Pix.R()*0.299 					+Pix.G()*0.587 					+Pix.B()*0.114); return(Pix.setRGB(Y, Y, Y) ); } void main(){ RV_ImgInImg,OutImg; OutImg = InImg.procPix(GrayScale); } RV_PixGrayScale(RV_PixPix){ intY; Y = (int)( Pix.R()*0.299 					+Pix.G()*0.587 					+Pix.B()*0.114); 		return( Pix.setRGB(Y, Y, Y) ); } void main(){ RV_ImgInImg,OutImg; OutImg = InImg.procPix(GrayScale, 4); } parallelize
for Reference: Example Code with OpenMP OpenMP Standardized model of parallel programming for C/C++ and FORTRAN #define NUM_THREADS 4 inti; int sum=0; #pragma parallel for(i=1;i<=256;i++)  sum+= i; Reduction pragma reduction(+:sum) Process 1 Process 2 Process 3 Process 4 for( ... )sum1+= i; for( ... )sum2+= i; for( ... )sum3+= i; for( ... )sum4+= i; sum
Reduction Op.s can be Automatically Added Applied Computing 2009 18 intsum = 0; void pixSum(RV_Pixel p){ sum += 1; } intmain(){ RV_ImageInputImg;     //read image data in “InputImg” InputImg.procPix(pixSum); } void __pixSum(intthreadNum) { mutex_lock(&Mutex);     sum += _localsum; mutex_unlock(&Mutex); } __thread int_localsum= 0; sum += 1; _localsum+= 1; Component function InputImg.procPix(pixSum, 4); inputImg.reduction(__pixSum); sum += 1 associative law  ? commutative law ?  associative law  OK! commutative law OK! Reduction operation _localsum+=1; sum+= _localsum;
Outline Concept of RaVioli RaVioli hides resolutions from programmers Easy writeability of video processing applications Pseudo real-time processing by adjusting loads Semi-automatic parallelization functions Automatic block decomposition Pipelining interface with automatic load balance mechanism Evaluation results of our work Applied Computing 2009 19
Assisting Pipeline Implementation For building pipeline Whole process is split into several stages Several threads are created and assigned to the stages FIFOs are needed to be implemented and managed for data transfer between stages Applied Computing 2009 20 Creating threads and FIFOs  ,[object Object]
is troublesome for programmersthread1 thread2 thread3 binarize edge detect hough trans FIFO3 FIFO2 FIFO1 ・ ・ ・ ・ ・ ・ ・ ・ ・
Interface for Pipelining Applied Computing 2009 21 RV_Pipedata* GrayScale(RV_Pipedata* data){    // Grayscale processing for a frame    return data; } RV_Pipedata* Laplacian(RV_Pipedata* data){    // Laplacian filter processing for a frame    return data;} int main (){ RV_Pipelinepipe; pipe.push(GrayScale); pipe.push(Laplacian); pipe.run();    return 0;} RV_Pipeline pipe FIFO1 FIFO2 thread1 thread2 push Laplacian GrayScale run ・ ・ ・ ・ ・ ・
Interface for Pipelining Applied Computing 2009 22 RV_Pipedata* GrayScale(RV_Pipedata* data){    // Grayscale processing for a frame    return data; } RV_Pipedata* Laplacian(RV_Pipedata* data){    // Laplacian filter processing for a frame    return data;} int main (){ RV_Pipelinepipe; pipe.push(GrayScale); pipe.push(Laplacian); pipe.run();    return 0;} RV_Pipeline pipe FIFO1 FIFO2 push thread1 thread2 Laplacian GrayScale run ・ ・ ・ ・ ・ ・
Load Imbalance between Stages Applied Computing 2009 23 thread1 thread2 thread3 A B C frame1 A B C frame2 A B C frame3 Pipeline stalls thread3 thread1 thread2 1 A B C 2 3 ・ ・ ・ ・ ・ ・ ・ ・ ・
Automatic Load Balancing Applied Computing 2009 24 thread1 thread2 thread3 frame1 frame2 frame3 thread2 C thread3 thread1 thread2 thread1 A B C B thread3 ・ ・ ・ ・ ・ ・ ・ ・ ・ C
Automatic Load Balancing Applied Computing 2009 25 thread1 thread2 thread3 A B C frame1 A B C frame2 A B C frame3 thread2 C thread1 thread1 1 A B 2 3 thread3 ・ ・ ・ ・ ・ ・ C
Outline Concept of RaVioli RaVioli hides resolutions from programmers Easy writeability of video processing applications Pseudo real-time processing by adjusting loads Semi-automatic parallelization functions Automatic parallelization with block decomposition Pipelining interfacewith automatic load balance mechanism Evaluation results of our work Applied Computing 2009 26
Evaluation: Resolution Adjustment 27 frame rate(fps) Number of pixels Priority set Spatial resolution :Temporal resolution 0:1 1:0 3:7
Evaluation: Parallelization Functions Applied Computing 2009 28
Evaluation: Auto Block Decomposition Applied Computing 2009 29 voronoi laplacian pixAverage hough
Evaluation: Hough transform 30      Reduction variable initialization      Reduction operations hough
Evaluation: Automatic load balancing 31 A B C A B C A B C A B C A A B C A B C A B
Conclusion RaVioli hides resolutions from programmers pseudo real-time processing has semi-automatic parallelization functions semi-automatic block decompotision load balancing mechanism between pipeline stages Our future works implementing automatic power-saving function to RaVioli making RaVioli adaptive to various platforms such as Cell Broadband Engine designing easy-to-write language which cooperates with RaVioli Applied Computing 2009 32

Contenu connexe

Tendances

8085 stack &amp; machine control instruction
8085 stack &amp; machine control instruction8085 stack &amp; machine control instruction
8085 stack &amp; machine control instructionprashant1271
 
FPGA Implementation of Mixed Radix CORDIC FFT
FPGA Implementation of Mixed Radix CORDIC FFTFPGA Implementation of Mixed Radix CORDIC FFT
FPGA Implementation of Mixed Radix CORDIC FFTIJSRD
 
Introduction of Online Machine Learning Algorithms
Introduction of Online Machine Learning AlgorithmsIntroduction of Online Machine Learning Algorithms
Introduction of Online Machine Learning AlgorithmsShao-Yen Hung
 
On Resolution Proofs for Combinational Equivalence
On Resolution Proofs for Combinational EquivalenceOn Resolution Proofs for Combinational Equivalence
On Resolution Proofs for Combinational Equivalencesatrajit
 
Chap5 - ADSP 21K Manual
Chap5 - ADSP 21K ManualChap5 - ADSP 21K Manual
Chap5 - ADSP 21K ManualSethCopeland
 
8085 logical instruction
8085 logical instruction8085 logical instruction
8085 logical instructionprashant1271
 
Mixing C++ & Python II: Pybind11
Mixing C++ & Python II: Pybind11Mixing C++ & Python II: Pybind11
Mixing C++ & Python II: Pybind11corehard_by
 
SDR channelizer by sooraj
SDR channelizer by soorajSDR channelizer by sooraj
SDR channelizer by soorajsooraj yadav
 
The mixed-signal modelling language VHDL-AMS and its semantics (ICNACSA 1999)
The mixed-signal modelling language VHDL-AMS and its semantics (ICNACSA 1999)The mixed-signal modelling language VHDL-AMS and its semantics (ICNACSA 1999)
The mixed-signal modelling language VHDL-AMS and its semantics (ICNACSA 1999)Peter Breuer
 
IoT with Ruby/mruby - RubyWorld Conference 2015
IoT with Ruby/mruby - RubyWorld Conference 2015IoT with Ruby/mruby - RubyWorld Conference 2015
IoT with Ruby/mruby - RubyWorld Conference 2015哲也 廣田
 
martelli.ppt
martelli.pptmartelli.ppt
martelli.pptVideoguy
 
8085 data transfer instruction set
8085 data transfer instruction set8085 data transfer instruction set
8085 data transfer instruction setprashant1271
 
FIR_Filters_with_FPGA
FIR_Filters_with_FPGAFIR_Filters_with_FPGA
FIR_Filters_with_FPGAIrvn Rynning
 

Tendances (16)

8085 stack &amp; machine control instruction
8085 stack &amp; machine control instruction8085 stack &amp; machine control instruction
8085 stack &amp; machine control instruction
 
FPGA Implementation of Mixed Radix CORDIC FFT
FPGA Implementation of Mixed Radix CORDIC FFTFPGA Implementation of Mixed Radix CORDIC FFT
FPGA Implementation of Mixed Radix CORDIC FFT
 
Introduction of Online Machine Learning Algorithms
Introduction of Online Machine Learning AlgorithmsIntroduction of Online Machine Learning Algorithms
Introduction of Online Machine Learning Algorithms
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
J0166875
J0166875J0166875
J0166875
 
On Resolution Proofs for Combinational Equivalence
On Resolution Proofs for Combinational EquivalenceOn Resolution Proofs for Combinational Equivalence
On Resolution Proofs for Combinational Equivalence
 
Chap5 - ADSP 21K Manual
Chap5 - ADSP 21K ManualChap5 - ADSP 21K Manual
Chap5 - ADSP 21K Manual
 
8085 logical instruction
8085 logical instruction8085 logical instruction
8085 logical instruction
 
Mixing C++ & Python II: Pybind11
Mixing C++ & Python II: Pybind11Mixing C++ & Python II: Pybind11
Mixing C++ & Python II: Pybind11
 
SDR channelizer by sooraj
SDR channelizer by soorajSDR channelizer by sooraj
SDR channelizer by sooraj
 
The mixed-signal modelling language VHDL-AMS and its semantics (ICNACSA 1999)
The mixed-signal modelling language VHDL-AMS and its semantics (ICNACSA 1999)The mixed-signal modelling language VHDL-AMS and its semantics (ICNACSA 1999)
The mixed-signal modelling language VHDL-AMS and its semantics (ICNACSA 1999)
 
IoT with Ruby/mruby - RubyWorld Conference 2015
IoT with Ruby/mruby - RubyWorld Conference 2015IoT with Ruby/mruby - RubyWorld Conference 2015
IoT with Ruby/mruby - RubyWorld Conference 2015
 
martelli.ppt
martelli.pptmartelli.ppt
martelli.ppt
 
2010 JNUG BoF
2010 JNUG BoF2010 JNUG BoF
2010 JNUG BoF
 
8085 data transfer instruction set
8085 data transfer instruction set8085 data transfer instruction set
8085 data transfer instruction set
 
FIR_Filters_with_FPGA
FIR_Filters_with_FPGAFIR_Filters_with_FPGA
FIR_Filters_with_FPGA
 

Similaire à RaVioli: A Parallel Vide Processing Library with Auto Resolution Adjustability

Flash and Hardware
Flash and HardwareFlash and Hardware
Flash and HardwareKevin Hoyt
 
Managing large (and small) R based solutions with R Suite
Managing large (and small) R based solutions with R SuiteManaging large (and small) R based solutions with R Suite
Managing large (and small) R based solutions with R SuiteWit Jakuczun
 
Implementation of FPGA Based Image Processing Algorithm using Xilinx System G...
Implementation of FPGA Based Image Processing Algorithm using Xilinx System G...Implementation of FPGA Based Image Processing Algorithm using Xilinx System G...
Implementation of FPGA Based Image Processing Algorithm using Xilinx System G...IRJET Journal
 
426 lecture 4: AR Developer Tools
426 lecture 4: AR Developer Tools426 lecture 4: AR Developer Tools
426 lecture 4: AR Developer ToolsMark Billinghurst
 
XebiCon'17 : Faites chauffer les neurones de votre Smartphone avec du Deep Le...
XebiCon'17 : Faites chauffer les neurones de votre Smartphone avec du Deep Le...XebiCon'17 : Faites chauffer les neurones de votre Smartphone avec du Deep Le...
XebiCon'17 : Faites chauffer les neurones de votre Smartphone avec du Deep Le...Publicis Sapient Engineering
 
License Plate Recognition System
License Plate Recognition System License Plate Recognition System
License Plate Recognition System Hira Rizvi
 
ARTDM 170, Week 15: Advanced
ARTDM 170, Week 15: AdvancedARTDM 170, Week 15: Advanced
ARTDM 170, Week 15: AdvancedGilbert Guerrero
 
IRJET- Sobel Edge Detection on ZYNQ based Architecture with Vivado
IRJET- Sobel Edge Detection on ZYNQ based Architecture with VivadoIRJET- Sobel Edge Detection on ZYNQ based Architecture with Vivado
IRJET- Sobel Edge Detection on ZYNQ based Architecture with VivadoIRJET Journal
 
Customizing a production pipeline
Customizing a production pipelineCustomizing a production pipeline
Customizing a production pipelineFelipe Lira
 
COSC 426 Lect. 3 -AR Developer Tools
COSC 426 Lect. 3 -AR Developer ToolsCOSC 426 Lect. 3 -AR Developer Tools
COSC 426 Lect. 3 -AR Developer ToolsMark Billinghurst
 
How to lock a Python in a cage? Managing Python environment inside an R project
How to lock a Python in a cage?  Managing Python environment inside an R projectHow to lock a Python in a cage?  Managing Python environment inside an R project
How to lock a Python in a cage? Managing Python environment inside an R projectWLOG Solutions
 
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...Christopher Diamantopoulos
 
Rendering Techniques for Augmented Reality and a Look Ahead at AR Foundation
Rendering Techniques for Augmented Reality and a Look Ahead at AR FoundationRendering Techniques for Augmented Reality and a Look Ahead at AR Foundation
Rendering Techniques for Augmented Reality and a Look Ahead at AR FoundationUnity Technologies
 
Don't turn on/off your Photoshop yet
Don't turn on/off your Photoshop yetDon't turn on/off your Photoshop yet
Don't turn on/off your Photoshop yetFrontownia
 
Computer graphics
Computer graphics Computer graphics
Computer graphics shafiq sangi
 
Image processing for robotics
Image processing for roboticsImage processing for robotics
Image processing for roboticsSALAAMCHAUS
 
Red5 Open Source Flash Server
Red5 Open Source Flash ServerRed5 Open Source Flash Server
Red5 Open Source Flash ServerSunil Swain
 

Similaire à RaVioli: A Parallel Vide Processing Library with Auto Resolution Adjustability (20)

Flash and Hardware
Flash and HardwareFlash and Hardware
Flash and Hardware
 
Managing large (and small) R based solutions with R Suite
Managing large (and small) R based solutions with R SuiteManaging large (and small) R based solutions with R Suite
Managing large (and small) R based solutions with R Suite
 
Implementation of FPGA Based Image Processing Algorithm using Xilinx System G...
Implementation of FPGA Based Image Processing Algorithm using Xilinx System G...Implementation of FPGA Based Image Processing Algorithm using Xilinx System G...
Implementation of FPGA Based Image Processing Algorithm using Xilinx System G...
 
426 lecture 4: AR Developer Tools
426 lecture 4: AR Developer Tools426 lecture 4: AR Developer Tools
426 lecture 4: AR Developer Tools
 
XebiCon'17 : Faites chauffer les neurones de votre Smartphone avec du Deep Le...
XebiCon'17 : Faites chauffer les neurones de votre Smartphone avec du Deep Le...XebiCon'17 : Faites chauffer les neurones de votre Smartphone avec du Deep Le...
XebiCon'17 : Faites chauffer les neurones de votre Smartphone avec du Deep Le...
 
License Plate Recognition System
License Plate Recognition System License Plate Recognition System
License Plate Recognition System
 
ARTDM 170, Week 15: Advanced
ARTDM 170, Week 15: AdvancedARTDM 170, Week 15: Advanced
ARTDM 170, Week 15: Advanced
 
IRJET- Sobel Edge Detection on ZYNQ based Architecture with Vivado
IRJET- Sobel Edge Detection on ZYNQ based Architecture with VivadoIRJET- Sobel Edge Detection on ZYNQ based Architecture with Vivado
IRJET- Sobel Edge Detection on ZYNQ based Architecture with Vivado
 
Customizing a production pipeline
Customizing a production pipelineCustomizing a production pipeline
Customizing a production pipeline
 
COSC 426 Lect. 3 -AR Developer Tools
COSC 426 Lect. 3 -AR Developer ToolsCOSC 426 Lect. 3 -AR Developer Tools
COSC 426 Lect. 3 -AR Developer Tools
 
How to lock a Python in a cage? Managing Python environment inside an R project
How to lock a Python in a cage?  Managing Python environment inside an R projectHow to lock a Python in a cage?  Managing Python environment inside an R project
How to lock a Python in a cage? Managing Python environment inside an R project
 
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
 
Rendering Techniques for Augmented Reality and a Look Ahead at AR Foundation
Rendering Techniques for Augmented Reality and a Look Ahead at AR FoundationRendering Techniques for Augmented Reality and a Look Ahead at AR Foundation
Rendering Techniques for Augmented Reality and a Look Ahead at AR Foundation
 
Don't turn on/off your Photoshop yet
Don't turn on/off your Photoshop yetDon't turn on/off your Photoshop yet
Don't turn on/off your Photoshop yet
 
Computer graphics
Computer graphics Computer graphics
Computer graphics
 
Capturing and Displaying Digital Image
Capturing and Displaying  Digital ImageCapturing and Displaying  Digital Image
Capturing and Displaying Digital Image
 
Image processing for robotics
Image processing for roboticsImage processing for robotics
Image processing for robotics
 
Red5 Open Source Flash Server
Red5 Open Source Flash ServerRed5 Open Source Flash Server
Red5 Open Source Flash Server
 
Gated-ViGAT
Gated-ViGATGated-ViGAT
Gated-ViGAT
 
Real Time Video Processing in FPGA
Real Time Video Processing in FPGA Real Time Video Processing in FPGA
Real Time Video Processing in FPGA
 

RaVioli: A Parallel Vide Processing Library with Auto Resolution Adjustability

  • 1. RaVioli: A Parallel Video Processing Librarywith Auto Resolution Adjustability Hiroko SAKURAI†Masaomi OHNO†Shintaro OKADA‡ Tomoaki TSUMURA† Hiroshi MATSUO† † Nagoya Institute of Technology, Japan ‡ Toyota Motor Corp., Japan IADIS International Conference APPLIED COMPUTING 2009 November 19 – 21, 2009 Rome, Italy
  • 2. Background(1/2): Portability of Video Applications Real-time video processing applications should run on a great variety of platforms Cell phones Cars PCs Principal goal of an application Long battery life High throughput Good accuracy Applied Computing 2009 2 We must rewrite a video processing program, when porting it to another platform
  • 3. Background(2/2): Many-Core Era is Coming Multi/Many-core processors have come into wide use Video processing applications have various parallelisms Pixels in video frames have data parallelism Multiple frames can be processed in parallel by pipelining promise good performance on such parallel systems Applied Computing 2009 3 Parallelizing programs is not so simple It becomes much important to improve compilers and libraries
  • 4. A Video Processing Library: RaVioli RaVioli provides: Easy writeability of pseudo real-time video processing Interfaces for parallelization Detecting data dependencies and formulating reductions Balancing loadsof pipeline stages Applied Computing 2009 4
  • 5. Outline Concept of RaVioli RaVioli hides resolutions from programmers Easy writeability of video processing applications Pseudo real-time processing by adjusting loads Semi-automatic parallelization functions Automatic block decomposition Pipelining interface with automatic load balance mechanism Evaluation results Applied Computing 2009 5
  • 6. Traditional Image Processing Program Image processing program written by traditional C Applied Computing 2009 6 InImg void main{ // Input image intluma; for(int y=0;y<180;y++){  for(int x=0;x<200;x++){ luma = (int)( InImg[x][y].R*0.299    +InImg[x][y].G*0.587    +InImg[x][y].B*0.114);   OutImg[x][y].R = luma; OutImg[x][y].G = luma; OutImg[x][y].B = luma;   } } } OutImg
  • 7. Image Processing Program with RaVioli Grayscale program using RaVioli Applied Computing 2009 7 RV_ImageInImg Component function RV_PixelGrayScale(RV_Pixel Pix){  intluma;  luma=(int)(    Pix.R()*0.299    +Pix.G()*0.587    +Pix.B()*0.114);  return(Pix.setRGB(luma, luma, luma)); } void main(){ RV_ImageInImg,OutImg; // Input image OutImg=InImg.procPix(GrayScale); } Higher-oder method procPix RV_ImageOutImg
  • 8. Video Processing Program with RaVioli Video processing program with RaVioli Applied Computing 2009 8 RV_Imageobj RV_PixelGrayScale(RV_Pixelp){ } Higher-oder method Grayscale RV_ImageGrayScale(RV_Imageimg){ } RV_Imageobj RV_Videoobj Higher-oder method
  • 9. Outline Concept of RaVioli RaVioli hides resolutions from programmers Easy writeability of video processing applications Pseudo real-time processing by adjusting loads Semi-automatic parallelization functions Automatic block decomposition Pipelining interface with automatic load balance mechanism Evaluation results Applied Computing 2009 9
  • 10. Auto-Adjustment of Computation Load Spatial resolution (pixel rate) Ss: Spatial stride Temporal resolution (frame rate) St: Temporal stride Applied Computing 2009 10 1/4 Ss=1 Ss=2 1/2 St=1 St=2
  • 11. Priority Set Which stride should be increased? (Spatial resolution, Temporal resolution)= (7,3) : keep spatial stride and temporal stride in the ratio of “3:7” (1,0) : keep spatial stride “1” Applied Computing 2009 11 Moving object detection Temporal resolution Pattern recognition Spatial resolution We can specify resolution priorities by priority set St=1 St=2 Ss=1 Ss=2
  • 12. Detecting Overload Applied Computing 2009 12 RV_Video class Frame interval Higher-oder method Overloaded! < Ring buffer Processing time RV_Image instance Image Processing program Higher-order method
  • 13. Outline Concept of RaVioli RaVioli hides resolutions from programmers Easy writeability of video processing applications Pseudo real-time processing by adjusting loads Semi-automatic parallelization functions Automatic block decomposition Pipelining interface with automatic load balance mechanism Evaluation results of our work Applied Computing 2009 13
  • 14. Parallelization: Block Decomposition Image processing with c/c++ Image processing with RaVioli RV_PixGrayScale(RV_PixPix){ intY; Y = (int)( Pix.R()*0.299 +Pix.G()*0.587 +Pix.B()*0.114); return( Pix.setRGB(Y, Y, Y) ); } void main(){ RV_ImgInImg, OutImg; OutImg = InImg.procPix(GrayScale); } void main(){ byte InImg[180][200]; byte OutImg[180][200]; for( inty=0; y<180; y++ ){ for( intx=0; x<200; x++ ){ OutImg[x][y]=(int)( InImg[x][y].R*0.299 +InImg[x][y].G*0.587 +InImg[x][y].B*0.114); } } }
  • 15. Parallelization: Block Decomposition Image processing with RaVioli RV_PixGrayScale(RV_PixPix){ intY; Y = (int)( Pix.R()*0.299 +Pix.G()*0.587 +Pix.B()*0.114); return( Pix.setRGB(Y, Y, Y) ); } voidmain(){ RV_ImgInImg,OutImg; OutImg = InImg.procPix(GrayScale); } thread1 thread2 thread4 thread3 OutImg = InImg.procPix(GrayScale, 4); InImg
  • 16. Translator for Block Decomposition Reduction operations may be required Applied Computing 2009 16 Translator RV_PixGrayScale(RV_PixPix){ intY; Y = (int)( Pix.R()*0.299 +Pix.G()*0.587 +Pix.B()*0.114); return(Pix.setRGB(Y, Y, Y) ); } void main(){ RV_ImgInImg,OutImg; OutImg = InImg.procPix(GrayScale); } RV_PixGrayScale(RV_PixPix){ intY; Y = (int)( Pix.R()*0.299 +Pix.G()*0.587 +Pix.B()*0.114); return( Pix.setRGB(Y, Y, Y) ); } void main(){ RV_ImgInImg,OutImg; OutImg = InImg.procPix(GrayScale, 4); } parallelize
  • 17. for Reference: Example Code with OpenMP OpenMP Standardized model of parallel programming for C/C++ and FORTRAN #define NUM_THREADS 4 inti; int sum=0; #pragma parallel for(i=1;i<=256;i++)  sum+= i; Reduction pragma reduction(+:sum) Process 1 Process 2 Process 3 Process 4 for( ... )sum1+= i; for( ... )sum2+= i; for( ... )sum3+= i; for( ... )sum4+= i; sum
  • 18. Reduction Op.s can be Automatically Added Applied Computing 2009 18 intsum = 0; void pixSum(RV_Pixel p){ sum += 1; } intmain(){ RV_ImageInputImg; //read image data in “InputImg” InputImg.procPix(pixSum); } void __pixSum(intthreadNum) { mutex_lock(&Mutex); sum += _localsum; mutex_unlock(&Mutex); } __thread int_localsum= 0; sum += 1; _localsum+= 1; Component function InputImg.procPix(pixSum, 4); inputImg.reduction(__pixSum); sum += 1 associative law ? commutative law ? associative law OK! commutative law OK! Reduction operation _localsum+=1; sum+= _localsum;
  • 19. Outline Concept of RaVioli RaVioli hides resolutions from programmers Easy writeability of video processing applications Pseudo real-time processing by adjusting loads Semi-automatic parallelization functions Automatic block decomposition Pipelining interface with automatic load balance mechanism Evaluation results of our work Applied Computing 2009 19
  • 20.
  • 21. is troublesome for programmersthread1 thread2 thread3 binarize edge detect hough trans FIFO3 FIFO2 FIFO1 ・ ・ ・ ・ ・ ・ ・ ・ ・
  • 22. Interface for Pipelining Applied Computing 2009 21 RV_Pipedata* GrayScale(RV_Pipedata* data){ // Grayscale processing for a frame return data; } RV_Pipedata* Laplacian(RV_Pipedata* data){ // Laplacian filter processing for a frame return data;} int main (){ RV_Pipelinepipe; pipe.push(GrayScale); pipe.push(Laplacian); pipe.run(); return 0;} RV_Pipeline pipe FIFO1 FIFO2 thread1 thread2 push Laplacian GrayScale run ・ ・ ・ ・ ・ ・
  • 23. Interface for Pipelining Applied Computing 2009 22 RV_Pipedata* GrayScale(RV_Pipedata* data){ // Grayscale processing for a frame return data; } RV_Pipedata* Laplacian(RV_Pipedata* data){ // Laplacian filter processing for a frame return data;} int main (){ RV_Pipelinepipe; pipe.push(GrayScale); pipe.push(Laplacian); pipe.run(); return 0;} RV_Pipeline pipe FIFO1 FIFO2 push thread1 thread2 Laplacian GrayScale run ・ ・ ・ ・ ・ ・
  • 24. Load Imbalance between Stages Applied Computing 2009 23 thread1 thread2 thread3 A B C frame1 A B C frame2 A B C frame3 Pipeline stalls thread3 thread1 thread2 1 A B C 2 3 ・ ・ ・ ・ ・ ・ ・ ・ ・
  • 25. Automatic Load Balancing Applied Computing 2009 24 thread1 thread2 thread3 frame1 frame2 frame3 thread2 C thread3 thread1 thread2 thread1 A B C B thread3 ・ ・ ・ ・ ・ ・ ・ ・ ・ C
  • 26. Automatic Load Balancing Applied Computing 2009 25 thread1 thread2 thread3 A B C frame1 A B C frame2 A B C frame3 thread2 C thread1 thread1 1 A B 2 3 thread3 ・ ・ ・ ・ ・ ・ C
  • 27. Outline Concept of RaVioli RaVioli hides resolutions from programmers Easy writeability of video processing applications Pseudo real-time processing by adjusting loads Semi-automatic parallelization functions Automatic parallelization with block decomposition Pipelining interfacewith automatic load balance mechanism Evaluation results of our work Applied Computing 2009 26
  • 28. Evaluation: Resolution Adjustment 27 frame rate(fps) Number of pixels Priority set Spatial resolution :Temporal resolution 0:1 1:0 3:7
  • 29. Evaluation: Parallelization Functions Applied Computing 2009 28
  • 30. Evaluation: Auto Block Decomposition Applied Computing 2009 29 voronoi laplacian pixAverage hough
  • 31. Evaluation: Hough transform 30 Reduction variable initialization Reduction operations hough
  • 32. Evaluation: Automatic load balancing 31 A B C A B C A B C A B C A A B C A B C A B
  • 33. Conclusion RaVioli hides resolutions from programmers pseudo real-time processing has semi-automatic parallelization functions semi-automatic block decompotision load balancing mechanism between pipeline stages Our future works implementing automatic power-saving function to RaVioli making RaVioli adaptive to various platforms such as Cell Broadband Engine designing easy-to-write language which cooperates with RaVioli Applied Computing 2009 32
  • 34. Automatic Load Balancing Applied Computing 2009 33 Manager thread3 thread1 thread2 1 2 3 A B C 4 5 ・ ・ ・ ・ ・ ・ ・ ・ ・
  • 35. Automatic Load Balancing Applied Computing 2009 34 A:1 B:1 C:4 Manager thread2 1 1 4 C thread3 thread1 thread2 thread1 4 5 2 A B C B 3 1 thread3 ・ ・ ・ ・ ・ ・ ・ ・ ・ C 1