SlideShare une entreprise Scribd logo
1  sur  47
Télécharger pour lire hors ligne
Generation of planar radiographs from 3D
             anatomical models using the GPU

                                     André dos Santos Cardoso
                                         Supervisor: Jorge M. G. Barbosa


                                             University of Porto
                                Faculty of Engineering of University of Porto


                                           11th February, 2011




                                                                                1/
André Cardoso   andre.cardoso@fe.up.pt           DRR Synthesis Algorithms          27
                                                                                1/27
Contents

    Introduction and Context

    CUDA Platform

    Input Data

    Pre-Processing Steps

    Developed Algorithms

    Conclusion

                                                                    2/
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms      27
                                                                    2/27
Introduction and Context

    CUDA Platform

    Input Data

    Pre-Processing Steps

    Developed Algorithms

    Conclusion



                                                                    2/
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms      27
                                                                    2/27
DRRs


    • Digitally Reconstructed Radiographs – DRRs
    • Artificial Radiographs taken from vertebrae models




   Figure: L3 Vertebra, frontal DRR                   Figure: L3 Vertebra, lateral DRR




                                                                                     3/
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms                       27
                                                                                     3/27
DRRs – Why?




  • Shape recovery of human spine
    ◦ 100s of DRRs per second
  • Scoliosis Evaluation
    ◦ Alternative to MRIs and CTs
                                                                    4/
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms      27
                                                                    4/27
Project’s Objective


    Build Fast DRR Algorithms
    • Common bottleneck!
      ◦ Applications in medical area – high throughputs are demanded
    • Take advantage new GPUs and APIs
      ◦ Common workstations could do the job!




                                                                       5/
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms         27
                                                                       5/27
Existing Solution – GLSL

    • GLSL implementation – multi-pass working solution
    • Depth Peeling Based – Cass Everitt, Interactive
      Order-Independent Transparency
    • Let’s try to enhance its performance!!




                                                                    6/
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms      27
                                                                    6/27
Algorithm Concepts

        Image Plane
                                                                          Obje
                                                                              ct




                                                      P4
                                                                  P3

                                                                              P2
                                      Object
                                                                                        P1

                                                                                             X-ray
                                                                                                   source




                                      Problem!
                            Potential Artifact Generation!



    • Each ray traverses the object
      ◦ Energy is attenuated
        PixelColor = exp ((||P2 − P1 || + ||P4 − P3 ||) × AttenuationFactor )
    • Common edges may lead to artifact generation!
                                                                                                            7/
André Cardoso    andre.cardoso@fe.up.pt                      DRR Synthesis Algorithms                          27
                                                                                                            7/27
Introduction and Context

    CUDA Platform

    Input Data

    Pre-Processing Steps

    Developed Algorithms

    Conclusion



                                                                    7/
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms      27
                                                                    7/27
CUDA Platform

  • Compute Unified Device Architecture
      ◦   Parallel Computing Architecture
      ◦   Exposes GPU functions and memory
      ◦   SIMT execution model
      ◦   Allows hierarchical configuration of
          threads


    • Cheap threads, dozens/hundreds of cores
      ◦ Thousands of concurrent threads!
    • GeForce GT 240
      ◦ 96 cores
      ◦ 12288 active threads

                                                                    8/
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms      27
                                                                    8/27
CUDA Platform – Threading and Memory




                                                                    9/
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms      27
                                                                    9/27
Introduction and Context

    CUDA Platform

    Input Data

    Pre-Processing Steps

    Developed Algorithms

    Conclusion



                                                                    9/
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms      27
                                                                    9/27
Inputs for Our Algorithms




  • Geometry file – the
      vertebrae models




                                                                    10 /
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms       27
                                                                    10/27
Inputs for Our Algorithms




  • Geometry file – the
      vertebrae models




                                                                    10 /
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms       27
                                                                    10/27
Inputs for Our Algorithms




  • Camera Calibration Matrix




                                                                    10 /
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms       27
                                                                    10/27
Inputs for Our Algorithms

                                                                          
                                                             αu λ u0
                                                       C =  0 αv v0 
                                                                    
                                                             0 0 1
                                                                          
                                                           f 0 0 0
  • Camera Calibration Matrix                          P= 0 f 0 0 
                                                                  

                                                           0 0 1 0
                                                                    R t
                                                       K=
                                                                    0T 1
                                                                     3



                                                                               X
                                                                                  
                                                                   
                                                              u               Y   
                                                        s    v  = C.P.K.        
                                                                               Z
                                                                                
                                                                                  
                                                              1
        Figure: Pinhole Model                                                  1       10 /
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms                          27
                                                                                       10/27
Introduction and Context

    CUDA Platform

    Input Data

    Pre-Processing Steps

    Developed Algorithms

    Conclusion



                                                                    10 /
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms       27
                                                                    10/27
Pre-Processing Steps




 1. 2D Bounding Box




                                                                    11 /
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms       27
                                                                    11/27
Pre-Processing Steps




 1. 2D Bounding Box




                                                                    11 /
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms       27
                                                                    11/27
Pre-Processing Steps




 1. 2D Bounding Box
 2. (Projection Source)




                                                                    11 /
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms       27
                                                                    11/27
Pre-Processing Steps




 1. 2D Bounding Box
 2. (Projection Source)




                                                                    11 /
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms       27
                                                                    11/27
Pre-Processing Steps




 1. 2D Bounding Box
 2. (Projection Source)
 3. Ray Direction
      (for each pixel)
      ◦ R(t) = O + tD




                                                                    11 /
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms       27
                                                                    11/27
Pre-Processing Steps




 1. 2D Bounding Box
 2. (Projection Source)
 3. Ray Direction
      (for each pixel)
      ◦ R(t) = O + tD




                                                                    11 /
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms       27
                                                                    11/27
Introduction and Context

    CUDA Platform

    Input Data

    Pre-Processing Steps

    Developed Algorithms

    Conclusion



                                                                    11 /
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms       27
                                                                    11/27
Image Order Approach



                                         1 Thread for Each Pixel
                                          • Thread ⇐⇒ Ray
                                          • Thread loops over ALL triangles
  • Ray Casting!                            ◦ Tests intersections between ray and
                                              triangle
                                            ◦ Acumulates distances to source
                                              along ray path




                                                                               12 /
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms                  27
                                                                               12/27
Image Order Approach – Problems



 1. Many threads looping
    over many triangles                   L3 Vertebra Model
                                          • 776 vertices, 1552 triangles
                                          • PA perspective: 266 × 138 pixels =
                                              36708 threads




                                                                            13 /
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms               27
                                                                            13/27
Image Order Approach – Problems



 1. Many threads looping
    over many triangles
 2. Useless intersection
    tests – heavy
    operations!




                                                                    13 /
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms       27
                                                                    13/27
Image Order Approach – Problems



 1. Many threads looping
    over many triangles
 2. Useless intersection
    tests – heavy
    operations!
 3. Artifacts – hard to take
    care of!




                                                                    13 /
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms       27
                                                                    13/27
Image Order Approach – Results



  • L3 vertebra model
  • PA camera – 265 × 137
    pixels
  • GPU time only!
  • Incomplete implementation



                SLOW!

                                                                    14 /
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms       27
                                                                    14/27
Object Order Approach



                                         1 Thread for Each Triangle
  • Ray Casting!                          • Thread loops over each pixel covered
  • Threads spanned for                       by the triangle bounding box
      each triangle                           ◦ Tests intersections between ray and
      ◦ Reverse the approach                    triangle
        of the former                         ◦ Acumulates distances to source
        algorithm!                              along ray path
                                          • Concurrency problems!




                                                                                 15 /
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms                    27
                                                                                 15/27
Object Order Approach – Problems


 1. Concurrency problems on
                                                                           Concurrent Threads
    pixel data.
      ◦ Fang Liu et al, FreePipe:
        a programmable parallel                                     int index = atomicInc(sharedCounter);
        rendering architecture for
        efficient multi-fragment



                                                   Pixel Bu er
        effects




                                                                                                            16 /
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms                                               27
                                                                                                            16/27
Object Order Approach – Problems



 1. Concurrency problems on
    pixel data.
 2. Still many intersection
    tests




                                                                    16 /
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms       27
                                                                    16/27
Object Order Approach – Problems



 1. Concurrency problems on
    pixel data.
 2. Still many intersection
    tests
 3. Artifacts still hard to avoid
    or correct




                                                                    16 /
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms       27
                                                                    16/27
Object Order Approach – Results



  • L3 vertebra model
  • PA camera – 265 × 137
    pixels
  • GPU time only!
  • Incomplete implementation



                SLOW!

                                                                    17 /
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms       27
                                                                    17/27
Multi-depth Approach - Principle

    Assume a Simplification
    • Discard the Euclidean distance between intersections!
    • Consider only distance between Fragments, along depth axis!!




                                         P2
                                               d1
                                                              P1

                                         P’2
                                               d2             P’1

                                                                               Source




                                                                                   18 /
André Cardoso   andre.cardoso@fe.up.pt              DRR Synthesis Algorithms           27
                                                                                   18/27
Multi-depth Approach - Pipeline




    • Rasterization done using Scanline+Bresenham algorithm
      ◦ Filling convention avoids artifacts :) !

    • Interpolation in Integer interval
                    Z −Zmin
       ◦ Depth = Zmax −Zmin × INT _MAX

    • Saving depth in pixel array, raises concurrency problems (again)!

                                                                      19 /
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms         27
                                                                      19/27
Multi-depth Approach - Depth array
Ordering
    atomicMin inserts in right place
      1:   initializeDepthArrays(MAX _INTEGER)
      2:   Znew ← interpolateDepth()
      3:   for i = 0 to DEPTH_ARRAY _SIZE − 1 do
      4:      Zold ← atomicMin(&(getPixelDepthArray (u, v , i)), Znew )
      5:      if Zold == MAX _INTEGER then
      6:         break
      7:      end if
      8:      Znew ← fmaxf (Znew , Zold)
      9:   end for

    • Fang Liu et al, FreePipe: a programmable parallel rendering
        architecture for efficient multi-fragment effects
                                                                      20 /
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms         27
                                                                      20/27
Multi-depth Approach - Results
  • Best time:
    ◦ 202 × 132 pixels
    ◦ GPU + CPU time!

      ◦ Performance With and
        Without DRR transfer to
        host!




         BETTER!                                                    21 /
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms       27
                                                                    21/27
Multi-depth Optimization


    • Multi-depth allows for an ordered set of depths
      ◦ More depths =⇒ more atomicMin() calls

    We can postpone depth Ordering...
      1:   index ← atomicInc(&counter, INT_MAX)
      2:   depthArray [index ] ← Znew // RAW-hazard free!!!!

    • depthArray has all the depth values;
      ◦ Ordering can be done on a post-processing kernel!!!




                                                                    22 /
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms       27
                                                                    22/27
Multi-depth Optimization
                                                    Concurrent Threads




                                            int index = atomicInc(sharedCounter);
                       Pixel Bu er




                                                                                    22 /
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms                       27
                                                                                    22/27
Multi-depth Optimization – Results
    • A-buffer Scheme Versus GLSL Solution
    • 202 × 132 pixels




                                                                    23 /
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms       27
                                                                    23/27
Multi-depth Optimization – Results




          Better than Current Solution




                                                                    23 /
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms       27
                                                                    23/27
Introduction and Context

    CUDA Platform

    Input Data

    Pre-Processing Steps

    Developed Algorithms

    Conclusion



                                                                    23 /
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms       27
                                                                    23/27
Conclusion



    • CUDA implementations for DRR extraction
      ◦ Both pre-processing and main computation tasks
      ◦ Artifact-free
    • Single geometry pass
    • Shared memory model
      ◦ May be adapted to other technologies
    • Final implementation shows better performance than GLSL




                                                                    24 /
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms       27
                                                                    24/27
Future Work

      There’s a Big Chart to Fill Up...




                                                                    25 /
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms       27
                                                                    25/27
Future Work
    • Still some artifacts
    • Memory operations optimizations
    • Comparisons with other implementations, other geometry
      models
    • Build a DRR generation library
         ◦ possibly an open-source project



  • Participation in IJUP’11                         • Paper preparation for
                                                        VIPIMAGE 2011. Abstract
                                                        Deadline: 15th March.


                                                                                  26 /
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms                     27
                                                                                  26/27
Thank You for Listening!
                                         Ask Away!




                                                                    27 /
André Cardoso   andre.cardoso@fe.up.pt   DRR Synthesis Algorithms       27
                                                                    27/27

Contenu connexe

Similaire à Generation of Planar Radiographs from 3D Anatomical Models Using the GPU

Generation of planar radiographs from 3D anatomical models using the GPU
Generation of planar radiographs from 3D anatomical models using the GPUGeneration of planar radiographs from 3D anatomical models using the GPU
Generation of planar radiographs from 3D anatomical models using the GPUthyandrecardoso
 
Eee c415 digital signal processing
Eee c415 digital signal processingEee c415 digital signal processing
Eee c415 digital signal processingkaiwins
 
David Loureiro - Presentation at HP's HPC & OSL TES
David Loureiro - Presentation at HP's HPC & OSL TESDavid Loureiro - Presentation at HP's HPC & OSL TES
David Loureiro - Presentation at HP's HPC & OSL TESSysFera
 
2012 Workshop, Introduction to LiDAR Workshop, Bruce Adey and Mark Stucky (Me...
2012 Workshop, Introduction to LiDAR Workshop, Bruce Adey and Mark Stucky (Me...2012 Workshop, Introduction to LiDAR Workshop, Bruce Adey and Mark Stucky (Me...
2012 Workshop, Introduction to LiDAR Workshop, Bruce Adey and Mark Stucky (Me...GIS in the Rockies
 
5th Qatar BIM User Day, BIM Interoperability Issues: Lessons learned from PLM
5th Qatar BIM User Day, BIM Interoperability Issues: Lessons learned from PLM5th Qatar BIM User Day, BIM Interoperability Issues: Lessons learned from PLM
5th Qatar BIM User Day, BIM Interoperability Issues: Lessons learned from PLMBIM User Day
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Codemotion
 
Iaetsd finger print recognition by cordic algorithm and pipelined fft
Iaetsd finger print recognition by cordic algorithm and pipelined fftIaetsd finger print recognition by cordic algorithm and pipelined fft
Iaetsd finger print recognition by cordic algorithm and pipelined fftIaetsd Iaetsd
 
Efficient Evaluation of Embedded-System Design Alternatives (SPLC Tutorial 2019)
Efficient Evaluation of Embedded-System Design Alternatives (SPLC Tutorial 2019)Efficient Evaluation of Embedded-System Design Alternatives (SPLC Tutorial 2019)
Efficient Evaluation of Embedded-System Design Alternatives (SPLC Tutorial 2019)Maxime Cordy
 
Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...
Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...
Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...Editor IJMTER
 
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time RaytracingSIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time RaytracingElectronic Arts / DICE
 
Directive-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous ComputingDirective-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous ComputingRuymán Reyes
 
Beyond Parametric - New Approach to Geometric Constraint Solving
Beyond Parametric - New Approach to Geometric Constraint SolvingBeyond Parametric - New Approach to Geometric Constraint Solving
Beyond Parametric - New Approach to Geometric Constraint SolvingNick Sidorenko
 

Similaire à Generation of Planar Radiographs from 3D Anatomical Models Using the GPU (20)

Generation of planar radiographs from 3D anatomical models using the GPU
Generation of planar radiographs from 3D anatomical models using the GPUGeneration of planar radiographs from 3D anatomical models using the GPU
Generation of planar radiographs from 3D anatomical models using the GPU
 
Eee c415 digital signal processing
Eee c415 digital signal processingEee c415 digital signal processing
Eee c415 digital signal processing
 
David Loureiro - Presentation at HP's HPC & OSL TES
David Loureiro - Presentation at HP's HPC & OSL TESDavid Loureiro - Presentation at HP's HPC & OSL TES
David Loureiro - Presentation at HP's HPC & OSL TES
 
Thesis Defense
Thesis DefenseThesis Defense
Thesis Defense
 
2012 Workshop, Introduction to LiDAR Workshop, Bruce Adey and Mark Stucky (Me...
2012 Workshop, Introduction to LiDAR Workshop, Bruce Adey and Mark Stucky (Me...2012 Workshop, Introduction to LiDAR Workshop, Bruce Adey and Mark Stucky (Me...
2012 Workshop, Introduction to LiDAR Workshop, Bruce Adey and Mark Stucky (Me...
 
3D Laser Scanning FPSO Mystras
3D Laser Scanning FPSO Mystras3D Laser Scanning FPSO Mystras
3D Laser Scanning FPSO Mystras
 
5th Qatar BIM User Day, BIM Interoperability Issues: Lessons learned from PLM
5th Qatar BIM User Day, BIM Interoperability Issues: Lessons learned from PLM5th Qatar BIM User Day, BIM Interoperability Issues: Lessons learned from PLM
5th Qatar BIM User Day, BIM Interoperability Issues: Lessons learned from PLM
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
 
3D Laser Scaning FPSO Mystras
3D Laser Scaning FPSO Mystras3D Laser Scaning FPSO Mystras
3D Laser Scaning FPSO Mystras
 
3D Laser Scaning FPSO Mystras
3D Laser Scaning FPSO Mystras3D Laser Scaning FPSO Mystras
3D Laser Scaning FPSO Mystras
 
DRESD In a Nutshell July07
DRESD In a Nutshell July07DRESD In a Nutshell July07
DRESD In a Nutshell July07
 
Iaetsd finger print recognition by cordic algorithm and pipelined fft
Iaetsd finger print recognition by cordic algorithm and pipelined fftIaetsd finger print recognition by cordic algorithm and pipelined fft
Iaetsd finger print recognition by cordic algorithm and pipelined fft
 
Efficient Evaluation of Embedded-System Design Alternatives (SPLC Tutorial 2019)
Efficient Evaluation of Embedded-System Design Alternatives (SPLC Tutorial 2019)Efficient Evaluation of Embedded-System Design Alternatives (SPLC Tutorial 2019)
Efficient Evaluation of Embedded-System Design Alternatives (SPLC Tutorial 2019)
 
Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...
Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...
Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...
 
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time RaytracingSIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
 
EuroHPC AI in DAPHNE
EuroHPC AI in DAPHNEEuroHPC AI in DAPHNE
EuroHPC AI in DAPHNE
 
Graph Theory and Databases
Graph Theory and DatabasesGraph Theory and Databases
Graph Theory and Databases
 
Hcj 2013-01-21
Hcj 2013-01-21Hcj 2013-01-21
Hcj 2013-01-21
 
Directive-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous ComputingDirective-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous Computing
 
Beyond Parametric - New Approach to Geometric Constraint Solving
Beyond Parametric - New Approach to Geometric Constraint SolvingBeyond Parametric - New Approach to Geometric Constraint Solving
Beyond Parametric - New Approach to Geometric Constraint Solving
 

Generation of Planar Radiographs from 3D Anatomical Models Using the GPU

  • 1. Generation of planar radiographs from 3D anatomical models using the GPU André dos Santos Cardoso Supervisor: Jorge M. G. Barbosa University of Porto Faculty of Engineering of University of Porto 11th February, 2011 1/ André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 1/27
  • 2. Contents Introduction and Context CUDA Platform Input Data Pre-Processing Steps Developed Algorithms Conclusion 2/ André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 2/27
  • 3. Introduction and Context CUDA Platform Input Data Pre-Processing Steps Developed Algorithms Conclusion 2/ André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 2/27
  • 4. DRRs • Digitally Reconstructed Radiographs – DRRs • Artificial Radiographs taken from vertebrae models Figure: L3 Vertebra, frontal DRR Figure: L3 Vertebra, lateral DRR 3/ André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 3/27
  • 5. DRRs – Why? • Shape recovery of human spine ◦ 100s of DRRs per second • Scoliosis Evaluation ◦ Alternative to MRIs and CTs 4/ André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 4/27
  • 6. Project’s Objective Build Fast DRR Algorithms • Common bottleneck! ◦ Applications in medical area – high throughputs are demanded • Take advantage new GPUs and APIs ◦ Common workstations could do the job! 5/ André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 5/27
  • 7. Existing Solution – GLSL • GLSL implementation – multi-pass working solution • Depth Peeling Based – Cass Everitt, Interactive Order-Independent Transparency • Let’s try to enhance its performance!! 6/ André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 6/27
  • 8. Algorithm Concepts Image Plane Obje ct P4 P3 P2 Object P1 X-ray source Problem! Potential Artifact Generation! • Each ray traverses the object ◦ Energy is attenuated PixelColor = exp ((||P2 − P1 || + ||P4 − P3 ||) × AttenuationFactor ) • Common edges may lead to artifact generation! 7/ André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 7/27
  • 9. Introduction and Context CUDA Platform Input Data Pre-Processing Steps Developed Algorithms Conclusion 7/ André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 7/27
  • 10. CUDA Platform • Compute Unified Device Architecture ◦ Parallel Computing Architecture ◦ Exposes GPU functions and memory ◦ SIMT execution model ◦ Allows hierarchical configuration of threads • Cheap threads, dozens/hundreds of cores ◦ Thousands of concurrent threads! • GeForce GT 240 ◦ 96 cores ◦ 12288 active threads 8/ André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 8/27
  • 11. CUDA Platform – Threading and Memory 9/ André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 9/27
  • 12. Introduction and Context CUDA Platform Input Data Pre-Processing Steps Developed Algorithms Conclusion 9/ André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 9/27
  • 13. Inputs for Our Algorithms • Geometry file – the vertebrae models 10 / André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 10/27
  • 14. Inputs for Our Algorithms • Geometry file – the vertebrae models 10 / André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 10/27
  • 15. Inputs for Our Algorithms • Camera Calibration Matrix 10 / André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 10/27
  • 16. Inputs for Our Algorithms   αu λ u0 C =  0 αv v0    0 0 1   f 0 0 0 • Camera Calibration Matrix P= 0 f 0 0    0 0 1 0 R t K= 0T 1 3 X     u  Y  s v  = C.P.K.   Z       1 Figure: Pinhole Model 1 10 / André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 10/27
  • 17. Introduction and Context CUDA Platform Input Data Pre-Processing Steps Developed Algorithms Conclusion 10 / André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 10/27
  • 18. Pre-Processing Steps 1. 2D Bounding Box 11 / André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 11/27
  • 19. Pre-Processing Steps 1. 2D Bounding Box 11 / André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 11/27
  • 20. Pre-Processing Steps 1. 2D Bounding Box 2. (Projection Source) 11 / André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 11/27
  • 21. Pre-Processing Steps 1. 2D Bounding Box 2. (Projection Source) 11 / André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 11/27
  • 22. Pre-Processing Steps 1. 2D Bounding Box 2. (Projection Source) 3. Ray Direction (for each pixel) ◦ R(t) = O + tD 11 / André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 11/27
  • 23. Pre-Processing Steps 1. 2D Bounding Box 2. (Projection Source) 3. Ray Direction (for each pixel) ◦ R(t) = O + tD 11 / André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 11/27
  • 24. Introduction and Context CUDA Platform Input Data Pre-Processing Steps Developed Algorithms Conclusion 11 / André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 11/27
  • 25. Image Order Approach 1 Thread for Each Pixel • Thread ⇐⇒ Ray • Thread loops over ALL triangles • Ray Casting! ◦ Tests intersections between ray and triangle ◦ Acumulates distances to source along ray path 12 / André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 12/27
  • 26. Image Order Approach – Problems 1. Many threads looping over many triangles L3 Vertebra Model • 776 vertices, 1552 triangles • PA perspective: 266 × 138 pixels = 36708 threads 13 / André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 13/27
  • 27. Image Order Approach – Problems 1. Many threads looping over many triangles 2. Useless intersection tests – heavy operations! 13 / André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 13/27
  • 28. Image Order Approach – Problems 1. Many threads looping over many triangles 2. Useless intersection tests – heavy operations! 3. Artifacts – hard to take care of! 13 / André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 13/27
  • 29. Image Order Approach – Results • L3 vertebra model • PA camera – 265 × 137 pixels • GPU time only! • Incomplete implementation SLOW! 14 / André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 14/27
  • 30. Object Order Approach 1 Thread for Each Triangle • Ray Casting! • Thread loops over each pixel covered • Threads spanned for by the triangle bounding box each triangle ◦ Tests intersections between ray and ◦ Reverse the approach triangle of the former ◦ Acumulates distances to source algorithm! along ray path • Concurrency problems! 15 / André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 15/27
  • 31. Object Order Approach – Problems 1. Concurrency problems on Concurrent Threads pixel data. ◦ Fang Liu et al, FreePipe: a programmable parallel int index = atomicInc(sharedCounter); rendering architecture for efficient multi-fragment Pixel Bu er effects 16 / André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 16/27
  • 32. Object Order Approach – Problems 1. Concurrency problems on pixel data. 2. Still many intersection tests 16 / André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 16/27
  • 33. Object Order Approach – Problems 1. Concurrency problems on pixel data. 2. Still many intersection tests 3. Artifacts still hard to avoid or correct 16 / André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 16/27
  • 34. Object Order Approach – Results • L3 vertebra model • PA camera – 265 × 137 pixels • GPU time only! • Incomplete implementation SLOW! 17 / André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 17/27
  • 35. Multi-depth Approach - Principle Assume a Simplification • Discard the Euclidean distance between intersections! • Consider only distance between Fragments, along depth axis!! P2 d1 P1 P’2 d2 P’1 Source 18 / André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 18/27
  • 36. Multi-depth Approach - Pipeline • Rasterization done using Scanline+Bresenham algorithm ◦ Filling convention avoids artifacts :) ! • Interpolation in Integer interval Z −Zmin ◦ Depth = Zmax −Zmin × INT _MAX • Saving depth in pixel array, raises concurrency problems (again)! 19 / André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 19/27
  • 37. Multi-depth Approach - Depth array Ordering atomicMin inserts in right place 1: initializeDepthArrays(MAX _INTEGER) 2: Znew ← interpolateDepth() 3: for i = 0 to DEPTH_ARRAY _SIZE − 1 do 4: Zold ← atomicMin(&(getPixelDepthArray (u, v , i)), Znew ) 5: if Zold == MAX _INTEGER then 6: break 7: end if 8: Znew ← fmaxf (Znew , Zold) 9: end for • Fang Liu et al, FreePipe: a programmable parallel rendering architecture for efficient multi-fragment effects 20 / André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 20/27
  • 38. Multi-depth Approach - Results • Best time: ◦ 202 × 132 pixels ◦ GPU + CPU time! ◦ Performance With and Without DRR transfer to host! BETTER! 21 / André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 21/27
  • 39. Multi-depth Optimization • Multi-depth allows for an ordered set of depths ◦ More depths =⇒ more atomicMin() calls We can postpone depth Ordering... 1: index ← atomicInc(&counter, INT_MAX) 2: depthArray [index ] ← Znew // RAW-hazard free!!!! • depthArray has all the depth values; ◦ Ordering can be done on a post-processing kernel!!! 22 / André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 22/27
  • 40. Multi-depth Optimization Concurrent Threads int index = atomicInc(sharedCounter); Pixel Bu er 22 / André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 22/27
  • 41. Multi-depth Optimization – Results • A-buffer Scheme Versus GLSL Solution • 202 × 132 pixels 23 / André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 23/27
  • 42. Multi-depth Optimization – Results Better than Current Solution 23 / André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 23/27
  • 43. Introduction and Context CUDA Platform Input Data Pre-Processing Steps Developed Algorithms Conclusion 23 / André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 23/27
  • 44. Conclusion • CUDA implementations for DRR extraction ◦ Both pre-processing and main computation tasks ◦ Artifact-free • Single geometry pass • Shared memory model ◦ May be adapted to other technologies • Final implementation shows better performance than GLSL 24 / André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 24/27
  • 45. Future Work There’s a Big Chart to Fill Up... 25 / André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 25/27
  • 46. Future Work • Still some artifacts • Memory operations optimizations • Comparisons with other implementations, other geometry models • Build a DRR generation library ◦ possibly an open-source project • Participation in IJUP’11 • Paper preparation for VIPIMAGE 2011. Abstract Deadline: 15th March. 26 / André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 26/27
  • 47. Thank You for Listening! Ask Away! 27 / André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 27/27