SlideShare une entreprise Scribd logo
1  sur  20
Télécharger pour lire hors ligne
GPU Computing with Ruby



      SpeedGo Computing

         Chung Shin Yee
 shinyee@speedgocomputing.com
CPU vs GPU Architecture
        6 Core vs 1024 Core
6 GB/s vs 300 GB/s Memory Bandwidth




       By CUDA C Programming Guide
CUDA Programming Model



                              .
                              .
                              .
                              .




By CUDA C Programming Guide
Existing Programming Tools
●   Cg
●   BrookGPU
●   GLSL (OpenGL Shading Language)
●   Nvidia CUDA C/C++
●   OpenCL
●   PyCUDA     Where is the Red Ruby ?
Bridging Ruby & CUDA C/C++
●   Ruby C extension
       –   Hard to manipulate Ruby objects in C.
       –   Compilation problems.
●   Ruby FFI
       –   Bridging purely in Ruby.
       –   Support multiple Ruby implementations.
Ruby Bridge Sample
Developing SGC Ruby CUDA
●   Object-oriented API.
●   Start with crucial operations.
       –   Memory allocation.
       –   Memory transfer.
       –   Kernel launch.
       –   Wrapper for structures.
●   Documented with YARD.
Driver vs Runtime API
●   CUDA Driver API
      –   For system developers.
      –   Supported by PyCUDA.
●   CUDA Runtime API
      –   For computation centric developers.


          We going to support both API !
Using SGC Ruby CUDA
●   Kernel program in CUDA C.
Using SGC Ruby CUDA
●   Compiling kernel into PTX.
       –   nvcc --ptx vadd.cu
Using SGC Ruby CUDA
●   Setup
        require 'rubycu'
        include SGC::CU
        CUInit.init
        d = CUDevice.get(0)
        c = CUContext.create(d)
        m = CUModule.new.load(“vadd.ptx”)
        f = m.function(“vadd”)
Using SGC Ruby CUDA
●   Memory allocations
        da = CUDevice.malloc(10*4)
        db = CUDevice.malloc(10*4)
        dc = CUDevice.malloc(10*4)
        ha = Buffer.new(:int, 10)
        hb = Buffer.new(:int, 10)
        hc = Buffer.new(:int, 10)
Using SGC Ruby CUDA
●   Initialization
         (0...10).each { |i|
                ha[i] = i
                hb[i] = 1
                hc[i] = ha[i] + hb[i]
                hd[i] = 0
         }
Using SGC Ruby CUDA
●   Transfer inputs to the GPU
        CUMemory.memcpy_htod(da, ha, 4*10)
        CUMemory.memcpy_htod(db, hb, 4*10)
        CUMemory.memcpy_htod(dc, hc, 4*10)
Using SGC Ruby CUDA
●    Launch kernel on GPU
            # Launch with 1x1x1 grid,
            # 10x1x1 blocks,
            params = [da, db, dc, 10]
            f.launch_kernel(1, 1, 1, 10, 1, 1, 0, 0, params)




    By CUDA C Programming Guide       By CUDA C Programming Guide
Using SGC Ruby CUDA
●   Transfer results back to system memory
         CUMemory.memcpy_dtoh(hd, dc, 4*10)
●   Verify results
         (0...10).each { |i|
               assert_equal(hc[i], hd[i])
         }
Problematic CUDA Runtime API
●   For use in a CUDA C/C++ program.
●   Workaround
       –   CUDA C/C++ effectively uses C/C++
            bindings.
       –   Create dynamic library for the kernel
            programs.
       –   Load the library at runtime.
Current Limitations
●   Support limited data types.
       –   Fixnum   → int
       –   ??       → long
       –   Float    → float
       –   ??       → double
●   No supports for CUDA C++ templates.
●   No Ruby in a kernel program.
To Support
●   Texture memory.
●   New features in CUDA 4.0
       –   Multi-GPU.
       –   Unified Virtual Memory.
●   More C data types.
●   Mac platform.
Try It Now! Thank You ~
git clone git://github.com/xman/sgc-ruby-cuda.git
cd sgc-ruby-cuda
gem install ffi yard
rake test
rake yard

Contenu connexe

Tendances

Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08
Angela Mendoza M.
 
NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012
DefCamp
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with Cuda
Rob Gillen
 

Tendances (20)

Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDA
 
Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08
 
Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)
 
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance Computing
 
GPU: Understanding CUDA
GPU: Understanding CUDAGPU: Understanding CUDA
GPU: Understanding CUDA
 
CUDA
CUDACUDA
CUDA
 
NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009
 
Introduction to CUDA C: NVIDIA : Notes
Introduction to CUDA C: NVIDIA : NotesIntroduction to CUDA C: NVIDIA : Notes
Introduction to CUDA C: NVIDIA : Notes
 
Cuda tutorial
Cuda tutorialCuda tutorial
Cuda tutorial
 
NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012
 
Debugging CUDA applications
Debugging CUDA applicationsDebugging CUDA applications
Debugging CUDA applications
 
GPGPU programming with CUDA
GPGPU programming with CUDAGPGPU programming with CUDA
GPGPU programming with CUDA
 
How to Burn Multi-GPUs using CUDA stress test memo
How to Burn Multi-GPUs using CUDA stress test memoHow to Burn Multi-GPUs using CUDA stress test memo
How to Burn Multi-GPUs using CUDA stress test memo
 
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
 
qCUDA-ARM : Virtualization for Embedded GPU Architectures
 qCUDA-ARM : Virtualization for Embedded GPU Architectures  qCUDA-ARM : Virtualization for Embedded GPU Architectures
qCUDA-ARM : Virtualization for Embedded GPU Architectures
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with Cuda
 
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
 
Cuda intro
Cuda introCuda intro
Cuda intro
 
Computing using GPUs
Computing using GPUsComputing using GPUs
Computing using GPUs
 
GS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill BilodeauGS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill Bilodeau
 

En vedette

Computación paralela con gp us cuda
Computación paralela con gp us cudaComputación paralela con gp us cuda
Computación paralela con gp us cuda
Javier Zarco
 
FAST MAP PROJECTION ON CUDA.ppt
FAST MAP PROJECTION ON CUDA.pptFAST MAP PROJECTION ON CUDA.ppt
FAST MAP PROJECTION ON CUDA.ppt
grssieee
 
CPU vs. GPU presentation
CPU vs. GPU presentationCPU vs. GPU presentation
CPU vs. GPU presentation
Vishal Singh
 

En vedette (9)

Ruby Supercomputing - Using The GPU For Massive Performance Speedup v1.1
Ruby Supercomputing - Using The GPU For Massive Performance Speedup v1.1Ruby Supercomputing - Using The GPU For Massive Performance Speedup v1.1
Ruby Supercomputing - Using The GPU For Massive Performance Speedup v1.1
 
Hardware Acceleration of Computional Fluid Dynamics SImulations in an Oxygena...
Hardware Acceleration of Computional Fluid Dynamics SImulations in an Oxygena...Hardware Acceleration of Computional Fluid Dynamics SImulations in an Oxygena...
Hardware Acceleration of Computional Fluid Dynamics SImulations in an Oxygena...
 
Sun Tzu: The Art Of...Business?
Sun Tzu: The Art Of...Business?Sun Tzu: The Art Of...Business?
Sun Tzu: The Art Of...Business?
 
Equipo 2 gpus
Equipo 2 gpusEquipo 2 gpus
Equipo 2 gpus
 
Computación paralela con gp us cuda
Computación paralela con gp us cudaComputación paralela con gp us cuda
Computación paralela con gp us cuda
 
FAST MAP PROJECTION ON CUDA.ppt
FAST MAP PROJECTION ON CUDA.pptFAST MAP PROJECTION ON CUDA.ppt
FAST MAP PROJECTION ON CUDA.ppt
 
OpenHPI - Parallel Programming Concepts - Week 4
OpenHPI - Parallel Programming Concepts - Week 4OpenHPI - Parallel Programming Concepts - Week 4
OpenHPI - Parallel Programming Concepts - Week 4
 
CPU vs. GPU presentation
CPU vs. GPU presentationCPU vs. GPU presentation
CPU vs. GPU presentation
 
GPU - An Introduction
GPU - An IntroductionGPU - An Introduction
GPU - An Introduction
 

Similaire à GPU Computing with Ruby

CUDA by Example : The Final Countdown : Notes
CUDA by Example : The Final Countdown : NotesCUDA by Example : The Final Countdown : Notes
CUDA by Example : The Final Countdown : Notes
Subhajit Sahu
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
Arka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
Arka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
Arka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
Arka Ghosh
 
Pycon2014 GPU computing
Pycon2014 GPU computingPycon2014 GPU computing
Pycon2014 GPU computing
Ashwin Ashok
 

Similaire à GPU Computing with Ruby (20)

lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
 
CuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPUCuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPU
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.
 
CUDA by Example : The Final Countdown : Notes
CUDA by Example : The Final Countdown : NotesCUDA by Example : The Final Countdown : Notes
CUDA by Example : The Final Countdown : Notes
 
Getting started with AMD GPUs
Getting started with AMD GPUsGetting started with AMD GPUs
Getting started with AMD GPUs
 
Hybrid Map Task Scheduling for GPU-based Heterogeneous Clusters
Hybrid Map Task Scheduling for GPU-based Heterogeneous ClustersHybrid Map Task Scheduling for GPU-based Heterogeneous Clusters
Hybrid Map Task Scheduling for GPU-based Heterogeneous Clusters
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmap
 
DockerCon EU '17 - Dockerizing Aurea
DockerCon EU '17 - Dockerizing AureaDockerCon EU '17 - Dockerizing Aurea
DockerCon EU '17 - Dockerizing Aurea
 
[HKOSCon x COSCUP 2020][20200801][Ansible: From VM to Kubernetes]
[HKOSCon x COSCUP 2020][20200801][Ansible: From VM to Kubernetes][HKOSCon x COSCUP 2020][20200801][Ansible: From VM to Kubernetes]
[HKOSCon x COSCUP 2020][20200801][Ansible: From VM to Kubernetes]
 
Introduction to cuda geek camp singapore 2011
Introduction to cuda   geek camp singapore 2011Introduction to cuda   geek camp singapore 2011
Introduction to cuda geek camp singapore 2011
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdf
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
NVIDIA GTC 2019: Red Hat and the NVIDIA DGX: Tried, Tested, Trusted
NVIDIA GTC 2019:  Red Hat and the NVIDIA DGX: Tried, Tested, TrustedNVIDIA GTC 2019:  Red Hat and the NVIDIA DGX: Tried, Tested, Trusted
NVIDIA GTC 2019: Red Hat and the NVIDIA DGX: Tried, Tested, Trusted
 
Pycon2014 GPU computing
Pycon2014 GPU computingPycon2014 GPU computing
Pycon2014 GPU computing
 
Building a continuous delivery platform for the biggest spike in e-commerce -...
Building a continuous delivery platform for the biggest spike in e-commerce -...Building a continuous delivery platform for the biggest spike in e-commerce -...
Building a continuous delivery platform for the biggest spike in e-commerce -...
 
The Rise of Parallel Computing
The Rise of Parallel ComputingThe Rise of Parallel Computing
The Rise of Parallel Computing
 
S0333 gtc2012-gmac-programming-cuda
S0333 gtc2012-gmac-programming-cudaS0333 gtc2012-gmac-programming-cuda
S0333 gtc2012-gmac-programming-cuda
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

GPU Computing with Ruby

  • 1. GPU Computing with Ruby SpeedGo Computing Chung Shin Yee shinyee@speedgocomputing.com
  • 2. CPU vs GPU Architecture 6 Core vs 1024 Core 6 GB/s vs 300 GB/s Memory Bandwidth By CUDA C Programming Guide
  • 3. CUDA Programming Model . . . . By CUDA C Programming Guide
  • 4. Existing Programming Tools ● Cg ● BrookGPU ● GLSL (OpenGL Shading Language) ● Nvidia CUDA C/C++ ● OpenCL ● PyCUDA Where is the Red Ruby ?
  • 5. Bridging Ruby & CUDA C/C++ ● Ruby C extension – Hard to manipulate Ruby objects in C. – Compilation problems. ● Ruby FFI – Bridging purely in Ruby. – Support multiple Ruby implementations.
  • 7. Developing SGC Ruby CUDA ● Object-oriented API. ● Start with crucial operations. – Memory allocation. – Memory transfer. – Kernel launch. – Wrapper for structures. ● Documented with YARD.
  • 8. Driver vs Runtime API ● CUDA Driver API – For system developers. – Supported by PyCUDA. ● CUDA Runtime API – For computation centric developers. We going to support both API !
  • 9. Using SGC Ruby CUDA ● Kernel program in CUDA C.
  • 10. Using SGC Ruby CUDA ● Compiling kernel into PTX. – nvcc --ptx vadd.cu
  • 11. Using SGC Ruby CUDA ● Setup require 'rubycu' include SGC::CU CUInit.init d = CUDevice.get(0) c = CUContext.create(d) m = CUModule.new.load(“vadd.ptx”) f = m.function(“vadd”)
  • 12. Using SGC Ruby CUDA ● Memory allocations da = CUDevice.malloc(10*4) db = CUDevice.malloc(10*4) dc = CUDevice.malloc(10*4) ha = Buffer.new(:int, 10) hb = Buffer.new(:int, 10) hc = Buffer.new(:int, 10)
  • 13. Using SGC Ruby CUDA ● Initialization (0...10).each { |i| ha[i] = i hb[i] = 1 hc[i] = ha[i] + hb[i] hd[i] = 0 }
  • 14. Using SGC Ruby CUDA ● Transfer inputs to the GPU CUMemory.memcpy_htod(da, ha, 4*10) CUMemory.memcpy_htod(db, hb, 4*10) CUMemory.memcpy_htod(dc, hc, 4*10)
  • 15. Using SGC Ruby CUDA ● Launch kernel on GPU # Launch with 1x1x1 grid, # 10x1x1 blocks, params = [da, db, dc, 10] f.launch_kernel(1, 1, 1, 10, 1, 1, 0, 0, params) By CUDA C Programming Guide By CUDA C Programming Guide
  • 16. Using SGC Ruby CUDA ● Transfer results back to system memory CUMemory.memcpy_dtoh(hd, dc, 4*10) ● Verify results (0...10).each { |i| assert_equal(hc[i], hd[i]) }
  • 17. Problematic CUDA Runtime API ● For use in a CUDA C/C++ program. ● Workaround – CUDA C/C++ effectively uses C/C++ bindings. – Create dynamic library for the kernel programs. – Load the library at runtime.
  • 18. Current Limitations ● Support limited data types. – Fixnum → int – ?? → long – Float → float – ?? → double ● No supports for CUDA C++ templates. ● No Ruby in a kernel program.
  • 19. To Support ● Texture memory. ● New features in CUDA 4.0 – Multi-GPU. – Unified Virtual Memory. ● More C data types. ● Mac platform.
  • 20. Try It Now! Thank You ~ git clone git://github.com/xman/sgc-ruby-cuda.git cd sgc-ruby-cuda gem install ffi yard rake test rake yard