SlideShare une entreprise Scribd logo
1  sur  26
A Scalable Tridiagonal Solver    For GPUs Team:WenMin Xiao&ChaoQun Li Institute of information science and  technology of Hunan University
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What is a tridiagonal system?
What is it used for? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Two Applications on GPU Depth of field blur, Michael Kass et al. Shallow water simulation OpenGL and Shader language  CUDA Cyclic reduction Cyclic reduction 2006 2007
A Classic Serial Algorithm ,[object Object],Phase 1:Forword Reduction Phase 2:Backward Substitution Elimination steps? Complexity? 2n-1 O(n)=2(n-1)+1
Parallel Algorithms ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],A set of equations mapped to one thread A single equation mapped to one thread
Cyclic Reduction 2-4  threads working Forward Reduction Backward Substitution 8-unkown system 4-unkown system 2-unkown system Solve 2 unkowns Solve the rest 2 unkowns Solve the rest 4 unkonws 2*log2(8)-1 = 2*3 -1 = 5 steps
Parallel Cyclic Reduction(PCR) Forward Redution No Backward Substitution One 8-unkown system Two 4-unkown systems Four 2-unkown systems Solve all unkowns 4  threads working log 2 (8)=3 steps
Advantages of Previous Algorithms ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hybird Algorithm ,[object Object],[object Object],One 8-unkown system One PCR step Parallel Thomas
GPU Implementation ,[object Object],[object Object],[object Object],[object Object]
Tiled PCR ,[object Object],Redundancy of  naive tiling  of PCR ,[object Object],[object Object],[object Object]
Dependency & Parallelism How to Reduce Redundancy? ,[object Object],[object Object],Solution 1 Redundancy is also exist!
Dependency & Parallelism cont Fine-grained tiling ,[object Object],[object Object],Solution 2 Without redundancy Sequential   Computation
Cache Design Buffered Sliding Window Illustration of the buffered sliding window 1. Immedicate   results  are cached 2.Each tile are processed  parallel 3.Each of tile has multiple sub tiles 4.Sub tiles are processed  sequentially  using cache
Components of Buffered Sliding Window ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example
Advantages of TPCR ,[object Object],[object Object],[object Object],[object Object]
Thread-level Parallel  Thomas Algorithm ,[object Object],[object Object],64B aligned segment 128B aligned segment
Performance Evaluation Test-Platform ,[object Object],[object Object],[object Object],[object Object]
Performance Results Parameter  M  and  N : number of systems and system size 8.3x and 49x speedups 5x and 30x speedups
Performance Analysis ,[object Object],[object Object],[object Object],[object Object],[object Object]
Summary ,[object Object],[object Object]
Reference ,[object Object],[object Object],[object Object],[object Object]
Question? Thanks

Contenu connexe

Tendances

Xian He Sun Data-Centric Into
Xian He Sun Data-Centric IntoXian He Sun Data-Centric Into
Xian He Sun Data-Centric IntoSciCompIIT
 
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...Hideyuki Tanaka
 
Memory Bandwidth QoS
Memory Bandwidth QoSMemory Bandwidth QoS
Memory Bandwidth QoSRohit Jnagal
 
Identifying Optimal Trade-Offs between CPU Time Usage and Temporal Constraints
Identifying Optimal Trade-Offs between CPU Time Usage and Temporal ConstraintsIdentifying Optimal Trade-Offs between CPU Time Usage and Temporal Constraints
Identifying Optimal Trade-Offs between CPU Time Usage and Temporal ConstraintsLionel Briand
 
Time space trade off
Time space trade offTime space trade off
Time space trade offanisha talwar
 
Java/Scala Lab 2016. Владимир Гарбуз: Написание безопасного кода на Java.
Java/Scala Lab 2016. Владимир Гарбуз: Написание безопасного кода на Java.Java/Scala Lab 2016. Владимир Гарбуз: Написание безопасного кода на Java.
Java/Scala Lab 2016. Владимир Гарбуз: Написание безопасного кода на Java.GeeksLab Odessa
 
Introduction to Cache-Oblivious Algorithms
Introduction to Cache-Oblivious AlgorithmsIntroduction to Cache-Oblivious Algorithms
Introduction to Cache-Oblivious AlgorithmsChristopher Gilbert
 
An area efficient relaxed half-stochastic decoding architecture for nonbinary...
An area efficient relaxed half-stochastic decoding architecture for nonbinary...An area efficient relaxed half-stochastic decoding architecture for nonbinary...
An area efficient relaxed half-stochastic decoding architecture for nonbinary...LogicMindtech Nologies
 
Integer quantization for deep learning inference: principles and empirical ev...
Integer quantization for deep learning inference: principles and empirical ev...Integer quantization for deep learning inference: principles and empirical ev...
Integer quantization for deep learning inference: principles and empirical ev...jemin lee
 
Xdp and ebpf_maps
Xdp and ebpf_mapsXdp and ebpf_maps
Xdp and ebpf_mapslcplcp1
 
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SHow I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SBrandon Liu
 
Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014Hajime Tazaki
 
Multicore programmingandtpl
Multicore programmingandtplMulticore programmingandtpl
Multicore programmingandtplYan Drugalya
 
Hs java open_party
Hs java open_partyHs java open_party
Hs java open_partyOpen Party
 
Kernelvm 201312-dlmopen
Kernelvm 201312-dlmopenKernelvm 201312-dlmopen
Kernelvm 201312-dlmopenHajime Tazaki
 
Programming Trends in High Performance Computing
Programming Trends in High Performance ComputingProgramming Trends in High Performance Computing
Programming Trends in High Performance ComputingJuris Vencels
 

Tendances (20)

Cat @ scale
Cat @ scaleCat @ scale
Cat @ scale
 
Xian He Sun Data-Centric Into
Xian He Sun Data-Centric IntoXian He Sun Data-Centric Into
Xian He Sun Data-Centric Into
 
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
 
Nicpaper2009
Nicpaper2009Nicpaper2009
Nicpaper2009
 
Memory Bandwidth QoS
Memory Bandwidth QoSMemory Bandwidth QoS
Memory Bandwidth QoS
 
Identifying Optimal Trade-Offs between CPU Time Usage and Temporal Constraints
Identifying Optimal Trade-Offs between CPU Time Usage and Temporal ConstraintsIdentifying Optimal Trade-Offs between CPU Time Usage and Temporal Constraints
Identifying Optimal Trade-Offs between CPU Time Usage and Temporal Constraints
 
Time space trade off
Time space trade offTime space trade off
Time space trade off
 
Java/Scala Lab 2016. Владимир Гарбуз: Написание безопасного кода на Java.
Java/Scala Lab 2016. Владимир Гарбуз: Написание безопасного кода на Java.Java/Scala Lab 2016. Владимир Гарбуз: Написание безопасного кода на Java.
Java/Scala Lab 2016. Владимир Гарбуз: Написание безопасного кода на Java.
 
Introduction to Cache-Oblivious Algorithms
Introduction to Cache-Oblivious AlgorithmsIntroduction to Cache-Oblivious Algorithms
Introduction to Cache-Oblivious Algorithms
 
An area efficient relaxed half-stochastic decoding architecture for nonbinary...
An area efficient relaxed half-stochastic decoding architecture for nonbinary...An area efficient relaxed half-stochastic decoding architecture for nonbinary...
An area efficient relaxed half-stochastic decoding architecture for nonbinary...
 
Integer quantization for deep learning inference: principles and empirical ev...
Integer quantization for deep learning inference: principles and empirical ev...Integer quantization for deep learning inference: principles and empirical ev...
Integer quantization for deep learning inference: principles and empirical ev...
 
Xdp and ebpf_maps
Xdp and ebpf_mapsXdp and ebpf_maps
Xdp and ebpf_maps
 
mTCP使ってみた
mTCP使ってみたmTCP使ってみた
mTCP使ってみた
 
Ch5 answers
Ch5 answersCh5 answers
Ch5 answers
 
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SHow I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
 
Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014
 
Multicore programmingandtpl
Multicore programmingandtplMulticore programmingandtpl
Multicore programmingandtpl
 
Hs java open_party
Hs java open_partyHs java open_party
Hs java open_party
 
Kernelvm 201312-dlmopen
Kernelvm 201312-dlmopenKernelvm 201312-dlmopen
Kernelvm 201312-dlmopen
 
Programming Trends in High Performance Computing
Programming Trends in High Performance ComputingProgramming Trends in High Performance Computing
Programming Trends in High Performance Computing
 

En vedette

Linked in series b pitch
Linked in series b pitchLinked in series b pitch
Linked in series b pitchUzoma Ikechukwu
 
Pisa sokk
Pisa sokkPisa sokk
Pisa sokktotyik
 
互联网人类学研究室
互联网人类学研究室互联网人类学研究室
互联网人类学研究室denghao12436
 
Mantas of maldives part 1
Mantas of maldives part 1Mantas of maldives part 1
Mantas of maldives part 1Abdulla Shafeeg
 
互联网人类学研究室
互联网人类学研究室互联网人类学研究室
互联网人类学研究室denghao12436
 
Ensayo dominio público
Ensayo dominio públicoEnsayo dominio público
Ensayo dominio públicoCONZAGA C.A.
 
Deadgirl_horror film
Deadgirl_horror filmDeadgirl_horror film
Deadgirl_horror filmdcjvicta
 
Előadás
ElőadásElőadás
Előadástotyik
 
úJ nemzedék
úJ nemzedékúJ nemzedék
úJ nemzedéktotyik
 
[14 10-2011 16-19_32]ds_du_thi_xep_lop_16_10_11
[14 10-2011 16-19_32]ds_du_thi_xep_lop_16_10_11[14 10-2011 16-19_32]ds_du_thi_xep_lop_16_10_11
[14 10-2011 16-19_32]ds_du_thi_xep_lop_16_10_11xuanhieu123445
 
Látlelet a magyarországi szegénységről
Látlelet a magyarországi szegénységrőlLátlelet a magyarországi szegénységről
Látlelet a magyarországi szegénységrőltotyik
 
Iskolai előadás pedagógus és intézményi ellenőrzésről
Iskolai előadás pedagógus és intézményi ellenőrzésrőlIskolai előadás pedagógus és intézményi ellenőrzésről
Iskolai előadás pedagógus és intézményi ellenőrzésrőltotyik
 
Totyik tbemutatóóra
Totyik tbemutatóóraTotyik tbemutatóóra
Totyik tbemutatóóratotyik
 

En vedette (15)

Ch026
Ch026Ch026
Ch026
 
Linked in series b pitch
Linked in series b pitchLinked in series b pitch
Linked in series b pitch
 
Pisa sokk
Pisa sokkPisa sokk
Pisa sokk
 
互联网人类学研究室
互联网人类学研究室互联网人类学研究室
互联网人类学研究室
 
Mantas of maldives part 1
Mantas of maldives part 1Mantas of maldives part 1
Mantas of maldives part 1
 
互联网人类学研究室
互联网人类学研究室互联网人类学研究室
互联网人类学研究室
 
Ensayo dominio público
Ensayo dominio públicoEnsayo dominio público
Ensayo dominio público
 
Deadgirl_horror film
Deadgirl_horror filmDeadgirl_horror film
Deadgirl_horror film
 
Előadás
ElőadásElőadás
Előadás
 
úJ nemzedék
úJ nemzedékúJ nemzedék
úJ nemzedék
 
[14 10-2011 16-19_32]ds_du_thi_xep_lop_16_10_11
[14 10-2011 16-19_32]ds_du_thi_xep_lop_16_10_11[14 10-2011 16-19_32]ds_du_thi_xep_lop_16_10_11
[14 10-2011 16-19_32]ds_du_thi_xep_lop_16_10_11
 
Látlelet a magyarországi szegénységről
Látlelet a magyarországi szegénységrőlLátlelet a magyarországi szegénységről
Látlelet a magyarországi szegénységről
 
Manual guardar agua chuva unhabitat
Manual guardar agua chuva unhabitatManual guardar agua chuva unhabitat
Manual guardar agua chuva unhabitat
 
Iskolai előadás pedagógus és intézményi ellenőrzésről
Iskolai előadás pedagógus és intézményi ellenőrzésrőlIskolai előadás pedagógus és intézményi ellenőrzésről
Iskolai előadás pedagógus és intézményi ellenőrzésről
 
Totyik tbemutatóóra
Totyik tbemutatóóraTotyik tbemutatóóra
Totyik tbemutatóóra
 

Similaire à Tridiagonal solver in gpu

Exploring hybrid memory for gpu energy efficiency through software hardware c...
Exploring hybrid memory for gpu energy efficiency through software hardware c...Exploring hybrid memory for gpu energy efficiency through software hardware c...
Exploring hybrid memory for gpu energy efficiency through software hardware c...Cheng-Hsuan Li
 
CPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching TechniquesCPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching TechniquesDilum Bandara
 
ExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
ExtraV - Boosting Graph Processing Near Storage with a Coherent AcceleratorExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
ExtraV - Boosting Graph Processing Near Storage with a Coherent AcceleratorJinho Lee
 
Cache Optimization Techniques for General Purpose Graphic Processing Units
Cache Optimization Techniques for General Purpose Graphic Processing UnitsCache Optimization Techniques for General Purpose Graphic Processing Units
Cache Optimization Techniques for General Purpose Graphic Processing UnitsVajira Thambawita
 
GPU Compute in Medical and Print Imaging
GPU Compute in Medical and Print ImagingGPU Compute in Medical and Print Imaging
GPU Compute in Medical and Print ImagingAMD
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUsfcassier
 
Unit II Arm 7 Introduction
Unit II Arm 7 IntroductionUnit II Arm 7 Introduction
Unit II Arm 7 IntroductionDr. Pankaj Zope
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2Junli Gu
 
Cisco crs1
Cisco crs1Cisco crs1
Cisco crs1wjunjmt
 
Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...
Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...
Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...Codemotion
 
Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Shien-Chun Luo
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Akihiro Hayashi
 
Atc On An Simd Cots System Wmpp05
Atc On An Simd Cots System   Wmpp05Atc On An Simd Cots System   Wmpp05
Atc On An Simd Cots System Wmpp05Ülger Ahmet
 
Nilesh ranpura systemmodelling
Nilesh ranpura systemmodellingNilesh ranpura systemmodelling
Nilesh ranpura systemmodellingObsidian Software
 

Similaire à Tridiagonal solver in gpu (20)

Exploring hybrid memory for gpu energy efficiency through software hardware c...
Exploring hybrid memory for gpu energy efficiency through software hardware c...Exploring hybrid memory for gpu energy efficiency through software hardware c...
Exploring hybrid memory for gpu energy efficiency through software hardware c...
 
26_Fan.pdf
26_Fan.pdf26_Fan.pdf
26_Fan.pdf
 
CPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching TechniquesCPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching Techniques
 
Packet sniffing
Packet sniffingPacket sniffing
Packet sniffing
 
ExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
ExtraV - Boosting Graph Processing Near Storage with a Coherent AcceleratorExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
ExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
 
Cache Optimization Techniques for General Purpose Graphic Processing Units
Cache Optimization Techniques for General Purpose Graphic Processing UnitsCache Optimization Techniques for General Purpose Graphic Processing Units
Cache Optimization Techniques for General Purpose Graphic Processing Units
 
GPU Compute in Medical and Print Imaging
GPU Compute in Medical and Print ImagingGPU Compute in Medical and Print Imaging
GPU Compute in Medical and Print Imaging
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 
L05 parallel
L05 parallelL05 parallel
L05 parallel
 
Unit II Arm 7 Introduction
Unit II Arm 7 IntroductionUnit II Arm 7 Introduction
Unit II Arm 7 Introduction
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
 
Cisco crs1
Cisco crs1Cisco crs1
Cisco crs1
 
Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...
Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...
Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...
 
Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)
 
Understanding DPDK
Understanding DPDKUnderstanding DPDK
Understanding DPDK
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
 
Dasia 2022
Dasia 2022Dasia 2022
Dasia 2022
 
4g lte matlab
4g lte matlab4g lte matlab
4g lte matlab
 
Atc On An Simd Cots System Wmpp05
Atc On An Simd Cots System   Wmpp05Atc On An Simd Cots System   Wmpp05
Atc On An Simd Cots System Wmpp05
 
Nilesh ranpura systemmodelling
Nilesh ranpura systemmodellingNilesh ranpura systemmodelling
Nilesh ranpura systemmodelling
 

Dernier

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 

Dernier (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Tridiagonal solver in gpu

  • 1. A Scalable Tridiagonal Solver For GPUs Team:WenMin Xiao&ChaoQun Li Institute of information science and technology of Hunan University
  • 2.
  • 3. What is a tridiagonal system?
  • 4.
  • 5. Two Applications on GPU Depth of field blur, Michael Kass et al. Shallow water simulation OpenGL and Shader language CUDA Cyclic reduction Cyclic reduction 2006 2007
  • 6.
  • 7.
  • 8. Cyclic Reduction 2-4 threads working Forward Reduction Backward Substitution 8-unkown system 4-unkown system 2-unkown system Solve 2 unkowns Solve the rest 2 unkowns Solve the rest 4 unkonws 2*log2(8)-1 = 2*3 -1 = 5 steps
  • 9. Parallel Cyclic Reduction(PCR) Forward Redution No Backward Substitution One 8-unkown system Two 4-unkown systems Four 2-unkown systems Solve all unkowns 4 threads working log 2 (8)=3 steps
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16. Cache Design Buffered Sliding Window Illustration of the buffered sliding window 1. Immedicate results are cached 2.Each tile are processed parallel 3.Each of tile has multiple sub tiles 4.Sub tiles are processed sequentially using cache
  • 17.
  • 19.
  • 20.
  • 21.
  • 22. Performance Results Parameter M and N : number of systems and system size 8.3x and 49x speedups 5x and 30x speedups
  • 23.
  • 24.
  • 25.