Generation of Planar Radiographs from 3D Anatomical Models Using the GPU

Generation of planar radiographs from 3D
anatomical models using the GPU

André dos Santos Cardoso
Supervisor: Jorge M. G. Barbosa

University of Porto
Faculty of Engineering of University of Porto

11th February, 2011

1/
André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27
1/27

Contents

Introduction and Context

CUDA Platform

Input Data

Pre-Processing Steps

Developed Algorithms

Conclusion

2/
2/27


CUDA Platform

Input Data



Conclusion

2/
2/27

DRRs

• Digitally Reconstructed Radiographs – DRRs
• Artiﬁcial Radiographs taken from vertebrae models

Figure: L3 Vertebra, frontal DRR Figure: L3 Vertebra, lateral DRR

3/
3/27

DRRs – Why?

• Shape recovery of human spine
◦ 100s of DRRs per second
• Scoliosis Evaluation
◦ Alternative to MRIs and CTs
4/
4/27

Project’s Objective

Build Fast DRR Algorithms
• Common bottleneck!
◦ Applications in medical area – high throughputs are demanded
• Take advantage new GPUs and APIs
◦ Common workstations could do the job!

5/
5/27

Existing Solution – GLSL

• GLSL implementation – multi-pass working solution
• Depth Peeling Based – Cass Everitt, Interactive
Order-Independent Transparency
• Let’s try to enhance its performance!!

6/
6/27

Algorithm Concepts

Image Plane
Obje
ct

P4
P3

P2
Object
P1

X-ray
source

Problem!
Potential Artifact Generation!

• Each ray traverses the object
◦ Energy is attenuated
PixelColor = exp ((||P2 − P1 || + ||P4 − P3 ||) × AttenuationFactor )
• Common edges may lead to artifact generation!
7/
7/27


CUDA Platform

Input Data



Conclusion

7/
7/27

CUDA Platform

• Compute Uniﬁed Device Architecture
◦ Parallel Computing Architecture
◦ Exposes GPU functions and memory
◦ SIMT execution model
◦ Allows hierarchical conﬁguration of
threads

• Cheap threads, dozens/hundreds of cores
◦ Thousands of concurrent threads!
• GeForce GT 240
◦ 96 cores
◦ 12288 active threads

8/
8/27

CUDA Platform – Threading and Memory

9/
9/27


CUDA Platform

Input Data



Conclusion

9/
9/27

Inputs for Our Algorithms

• Geometry ﬁle – the
vertebrae models

10 /
10/27


• Camera Calibration Matrix

10 /
10/27


 
αu λ u0
C =  0 αv v0 
 
0 0 1
 
f 0 0 0
• Camera Calibration Matrix P= 0 f 0 0 
 

0 0 1 0
R t
K=
0T 1
3

X
 
 
u  Y 
s v  = C.P.K.  
Z
   
 
1
Figure: Pinhole Model 1 10 /
10/27


CUDA Platform

Input Data



Conclusion

10 /
10/27


1. 2D Bounding Box

11 /
11/27


1. 2D Bounding Box
2. (Projection Source)

11 /
11/27


1. 2D Bounding Box
2. (Projection Source)
3. Ray Direction
(for each pixel)
◦ R(t) = O + tD

11 /
11/27


CUDA Platform

Input Data



Conclusion

11 /
11/27

Image Order Approach

1 Thread for Each Pixel
• Thread ⇐⇒ Ray
• Thread loops over ALL triangles
• Ray Casting! ◦ Tests intersections between ray and
triangle
◦ Acumulates distances to source
along ray path

12 /
12/27

Image Order Approach – Problems

1. Many threads looping
over many triangles L3 Vertebra Model
• 776 vertices, 1552 triangles
• PA perspective: 266 × 138 pixels =
36708 threads

13 /
13/27


over many triangles
2. Useless intersection
tests – heavy
operations!

13 /
13/27


over many triangles
2. Useless intersection
tests – heavy
operations!
3. Artifacts – hard to take
care of!

13 /
13/27

Image Order Approach – Results

• L3 vertebra model
• PA camera – 265 × 137
pixels
• GPU time only!
• Incomplete implementation

SLOW!

14 /
14/27

Object Order Approach

1 Thread for Each Triangle
• Ray Casting! • Thread loops over each pixel covered
• Threads spanned for by the triangle bounding box
each triangle ◦ Tests intersections between ray and
◦ Reverse the approach triangle
of the former ◦ Acumulates distances to source
algorithm! along ray path
• Concurrency problems!

15 /
15/27

Object Order Approach – Problems

1. Concurrency problems on
Concurrent Threads
pixel data.
◦ Fang Liu et al, FreePipe:
a programmable parallel int index = atomicInc(sharedCounter);
rendering architecture for
eﬃcient multi-fragment

Pixel Bu er
eﬀects

16 /
16/27


pixel data.
2. Still many intersection
tests

16 /
16/27


pixel data.
2. Still many intersection
tests
3. Artifacts still hard to avoid
or correct

16 /
16/27

Object Order Approach – Results

• L3 vertebra model
• PA camera – 265 × 137
pixels
• GPU time only!
• Incomplete implementation

SLOW!

17 /
17/27

Multi-depth Approach - Principle

Assume a Simpliﬁcation
• Discard the Euclidean distance between intersections!
• Consider only distance between Fragments, along depth axis!!

P2
d1
P1

P’2
d2 P’1

Source

18 /
18/27

Multi-depth Approach - Pipeline

• Rasterization done using Scanline+Bresenham algorithm
◦ Filling convention avoids artifacts :) !

• Interpolation in Integer interval
Z −Zmin
◦ Depth = Zmax −Zmin × INT _MAX

• Saving depth in pixel array, raises concurrency problems (again)!

19 /
19/27

Multi-depth Approach - Depth array
Ordering
atomicMin inserts in right place
1: initializeDepthArrays(MAX _INTEGER)
2: Znew ← interpolateDepth()
3: for i = 0 to DEPTH_ARRAY _SIZE − 1 do
4: Zold ← atomicMin(&(getPixelDepthArray (u, v , i)), Znew )
5: if Zold == MAX _INTEGER then
6: break
7: end if
8: Znew ← fmaxf (Znew , Zold)
9: end for

• Fang Liu et al, FreePipe: a programmable parallel rendering
architecture for eﬃcient multi-fragment eﬀects
20 /
20/27

Multi-depth Approach - Results
• Best time:
◦ 202 × 132 pixels
◦ GPU + CPU time!

◦ Performance With and
Without DRR transfer to
host!

BETTER! 21 /
21/27

Multi-depth Optimization

• Multi-depth allows for an ordered set of depths
◦ More depths =⇒ more atomicMin() calls

We can postpone depth Ordering...
1: index ← atomicInc(&counter, INT_MAX)
2: depthArray [index ] ← Znew // RAW-hazard free!!!!

• depthArray has all the depth values;
◦ Ordering can be done on a post-processing kernel!!!

22 /
22/27

Multi-depth Optimization
Concurrent Threads

int index = atomicInc(sharedCounter);
Pixel Bu er

22 /
22/27

Multi-depth Optimization – Results
• A-buﬀer Scheme Versus GLSL Solution
• 202 × 132 pixels

23 /
23/27

Multi-depth Optimization – Results

Better than Current Solution

23 /
23/27


CUDA Platform

Input Data



Conclusion

23 /
23/27

Conclusion

• CUDA implementations for DRR extraction
◦ Both pre-processing and main computation tasks
◦ Artifact-free
• Single geometry pass
• Shared memory model
◦ May be adapted to other technologies
• Final implementation shows better performance than GLSL

24 /
24/27

Future Work

There’s a Big Chart to Fill Up...

25 /
25/27

Future Work
• Still some artifacts
• Memory operations optimizations
• Comparisons with other implementations, other geometry
models
• Build a DRR generation library
◦ possibly an open-source project

• Participation in IJUP’11 • Paper preparation for
VIPIMAGE 2011. Abstract
Deadline: 15th March.

26 /
26/27

Thank You for Listening!
Ask Away!

27 /
27/27

Generation of Planar Radiographs from 3D Anatomical Models Using the GPU

Recommandé

Recommandé

Contenu connexe

Similaire à Generation of Planar Radiographs from 3D Anatomical Models Using the GPU

Similaire à Generation of Planar Radiographs from 3D Anatomical Models Using the GPU (20)

Generation of Planar Radiographs from 3D Anatomical Models Using the GPU