5. Two Applications on GPU Depth of field blur, Michael Kass et al. Shallow water simulation OpenGL and Shader language CUDA Cyclic reduction Cyclic reduction 2006 2007
6.
7.
8. Cyclic Reduction 2-4 threads working Forward Reduction Backward Substitution 8-unkown system 4-unkown system 2-unkown system Solve 2 unkowns Solve the rest 2 unkowns Solve the rest 4 unkonws 2*log2(8)-1 = 2*3 -1 = 5 steps
9. Parallel Cyclic Reduction(PCR) Forward Redution No Backward Substitution One 8-unkown system Two 4-unkown systems Four 2-unkown systems Solve all unkowns 4 threads working log 2 (8)=3 steps
10.
11.
12.
13.
14.
15.
16. Cache Design Buffered Sliding Window Illustration of the buffered sliding window 1. Immedicate results are cached 2.Each tile are processed parallel 3.Each of tile has multiple sub tiles 4.Sub tiles are processed sequentially using cache