SlideShare a Scribd company logo
1 of 21
Download to read offline
ArrayFire Webinar:
OpenCL and CUDA Trade-Offs and Comparisons
GPU Software Features
              Programmability
Portability                 Scalability

Performance                Community
Leading GPU Computing Platforms
              Programmability
Portability                 Scalability

Performance                Community
Performance
• Both CUDA and OpenCL are fast
• Both can fully utilize the hardware
• Devil in the details:
  – Hardware type, algorithm type, code quality
Performance
• ArrayFire results at end of webinar
Scalability
• Laptops --> Single GPU Machine
  – Both CUDA and OpenCL scale, no code change
• Single GPU Machine --> Multi-GPU Machine
  – User managed, low-level synchronization
• Multi-GPU Machine --> Cluster
  – MPI
Scalability
Interesting developments:
• Memory Transfer Optimizations
  – CUDA GPUDirect technology
• Mobile GPU Computing
  – OpenCL available on ARM, Imgtec, Freescale, …
Scalability
• Laptops --> Single GPU Machine
   – ArrayFire’s JIT optimizes for GPU type
• Single GPU Machine --> Multi-GPU Machine
   – ArrayFire’s deviceset() function is super easy
• Multi-GPU Machine --> Cluster
   – MPI
Portability
• CUDA is NVIDIA-only
  – Open source announcement
  – Does not provide CPU fallback
• OpenCL is the open industry standard
  – Runs on AMD, Intel, and NVIDIA
  – Provides CPU fallback
Portability
• ArrayFire is fully portable
  – Same ArrayFire code runs on CUDA or OpenCL
  – Simply select the right library
Community
• NVIDIA CUDA Forums – 26,893 topics
• AMD OpenCL Forums – 4,038 topics

• Stackoverflow CUDA Tag – 1,709 tags
• Stackoverflow OpenCL Tag – 564 tags
Community
• AccelerEyes GPU Forums – 1,435 topics
  – Largest GPU forums by a software company
• Next largest
  – PGI GPU Forums – 485 topics
Programmability
• Both CUDA and OpenCL are low-level
  – Time consuming kernel development
  – Data-parallel algorithm design
• Focus on programmability interfaces
Programmability

      Faster




                   SSE or
      Slower
                    AVX


               Time-consuming   Easy-to-use
Programmability
                  Writing
      Faster
                  Kernels




                   SSE or
      Slower
                    AVX


               Time-consuming   Easy-to-use
Programmability
                  Writing
      Faster
                  Kernels




                   SSE or       Compiler
      Slower
                    AVX         Directives


               Time-consuming   Easy-to-use
Programmability
                  Writing          Using
      Faster
                  Kernels        Libraries




                   SSE or       Compiler
      Slower
                    AVX         Directives


               Time-consuming   Easy-to-use
Libraries Make the Difference
Raw math libraries in NVIDIA CUDA
• CUBLAS, CUFFT, CULA, Magma
  – Provides all BLAS, LAPACK, and FFT routines
    necessary for most dense matrix operations
• CUSPARSE
  – A good start for sparse linear algebra
Libraries Make the Difference
Raw math libraries in AMD OpenCL
• clAmdBlas, clAmdFft
  – Provides most important Blas routines
  – Provides radix 2, 3, and 5 FFT routines
  – No LAPACK support
  – No sparse data support
Libraries Make the Difference
Raw math libraries in AMD OpenCL
• clAmdBlas, clAmdFft
  – Provides most important Blas routines
  – Provides radix 2, 3, and 5 FFT routines
  – No LAPACK support
  – No sparse data support
  – Runs on any OpenCL-compliant device
ArrayFire Code and Benchmarks

More Related Content

What's hot

What's hot (20)

My ambariexperience
My ambariexperienceMy ambariexperience
My ambariexperience
 
How Pulsar Stores Your Data - Pulsar Summit NA 2021
How Pulsar Stores Your Data - Pulsar Summit NA 2021How Pulsar Stores Your Data - Pulsar Summit NA 2021
How Pulsar Stores Your Data - Pulsar Summit NA 2021
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache Kafka
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Avro Tutorial - Records with Schema for Kafka and Hadoop
Avro Tutorial - Records with Schema for Kafka and HadoopAvro Tutorial - Records with Schema for Kafka and Hadoop
Avro Tutorial - Records with Schema for Kafka and Hadoop
 
Kafka Tutorial, Kafka ecosystem with clustering examples
Kafka Tutorial, Kafka ecosystem with clustering examplesKafka Tutorial, Kafka ecosystem with clustering examples
Kafka Tutorial, Kafka ecosystem with clustering examples
 
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
 
Kafka and Avro with Confluent Schema Registry
Kafka and Avro with Confluent Schema RegistryKafka and Avro with Confluent Schema Registry
Kafka and Avro with Confluent Schema Registry
 
Kafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platformKafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platform
 
Cassandra Summit 2014: Deploying Cassandra for Call of Duty
Cassandra Summit 2014: Deploying Cassandra for Call of DutyCassandra Summit 2014: Deploying Cassandra for Call of Duty
Cassandra Summit 2014: Deploying Cassandra for Call of Duty
 
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
 
Deploying microservices on AWS
Deploying microservices on AWSDeploying microservices on AWS
Deploying microservices on AWS
 
Lessons Learned From Running Spark On Docker
Lessons Learned From Running Spark On DockerLessons Learned From Running Spark On Docker
Lessons Learned From Running Spark On Docker
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
Modern software architectures - PHP UK Conference 2015
Modern software architectures - PHP UK Conference 2015Modern software architectures - PHP UK Conference 2015
Modern software architectures - PHP UK Conference 2015
 
Best Practices for Running Kafka on Docker Containers
Best Practices for Running Kafka on Docker ContainersBest Practices for Running Kafka on Docker Containers
Best Practices for Running Kafka on Docker Containers
 
Kafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer ConsumersKafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer Consumers
 
Apache spark
Apache sparkApache spark
Apache spark
 

Similar to CUDA vs OpenCL

SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4
UniFabric
 
Dad i want a supercomputer on my next
Dad i want a supercomputer on my nextDad i want a supercomputer on my next
Dad i want a supercomputer on my next
Akash Sahoo
 

Similar to CUDA vs OpenCL (20)

Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
 
pps Matters
pps Matterspps Matters
pps Matters
 
Scala and Spark are Ideal for Big Data
Scala and Spark are Ideal for Big DataScala and Spark are Ideal for Big Data
Scala and Spark are Ideal for Big Data
 
SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4
 
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
 
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsLeveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
 
Introduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AIIntroduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AI
 
Current Trends in HPC
Current Trends in HPCCurrent Trends in HPC
Current Trends in HPC
 
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGAMaking the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
 
CA UNIT IV.pptx
CA UNIT IV.pptxCA UNIT IV.pptx
CA UNIT IV.pptx
 
Go Faster with Ansible (AWS meetup)
Go Faster with Ansible (AWS meetup)Go Faster with Ansible (AWS meetup)
Go Faster with Ansible (AWS meetup)
 
Tech Days 2015: Embedded Product Update
Tech Days 2015: Embedded Product UpdateTech Days 2015: Embedded Product Update
Tech Days 2015: Embedded Product Update
 
Tuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache SparkTuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache Spark
 
HPC in the Cloud
HPC in the CloudHPC in the Cloud
HPC in the Cloud
 
FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)
 
Servers Technologies and Enterprise Data Center Trends 2014 - Thailand
Servers Technologies and Enterprise Data Center Trends 2014 - ThailandServers Technologies and Enterprise Data Center Trends 2014 - Thailand
Servers Technologies and Enterprise Data Center Trends 2014 - Thailand
 
OpenPOWER Webinar
OpenPOWER Webinar OpenPOWER Webinar
OpenPOWER Webinar
 
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-PremiseTackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
 
Dad i want a supercomputer on my next
Dad i want a supercomputer on my nextDad i want a supercomputer on my next
Dad i want a supercomputer on my next
 
0 foundation update__final - Mendy Furmanek
0 foundation update__final - Mendy Furmanek0 foundation update__final - Mendy Furmanek
0 foundation update__final - Mendy Furmanek
 

Recently uploaded

Recently uploaded (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

CUDA vs OpenCL

  • 1. ArrayFire Webinar: OpenCL and CUDA Trade-Offs and Comparisons
  • 2. GPU Software Features Programmability Portability Scalability Performance Community
  • 3. Leading GPU Computing Platforms Programmability Portability Scalability Performance Community
  • 4. Performance • Both CUDA and OpenCL are fast • Both can fully utilize the hardware • Devil in the details: – Hardware type, algorithm type, code quality
  • 6. Scalability • Laptops --> Single GPU Machine – Both CUDA and OpenCL scale, no code change • Single GPU Machine --> Multi-GPU Machine – User managed, low-level synchronization • Multi-GPU Machine --> Cluster – MPI
  • 7. Scalability Interesting developments: • Memory Transfer Optimizations – CUDA GPUDirect technology • Mobile GPU Computing – OpenCL available on ARM, Imgtec, Freescale, …
  • 8. Scalability • Laptops --> Single GPU Machine – ArrayFire’s JIT optimizes for GPU type • Single GPU Machine --> Multi-GPU Machine – ArrayFire’s deviceset() function is super easy • Multi-GPU Machine --> Cluster – MPI
  • 9. Portability • CUDA is NVIDIA-only – Open source announcement – Does not provide CPU fallback • OpenCL is the open industry standard – Runs on AMD, Intel, and NVIDIA – Provides CPU fallback
  • 10. Portability • ArrayFire is fully portable – Same ArrayFire code runs on CUDA or OpenCL – Simply select the right library
  • 11. Community • NVIDIA CUDA Forums – 26,893 topics • AMD OpenCL Forums – 4,038 topics • Stackoverflow CUDA Tag – 1,709 tags • Stackoverflow OpenCL Tag – 564 tags
  • 12. Community • AccelerEyes GPU Forums – 1,435 topics – Largest GPU forums by a software company • Next largest – PGI GPU Forums – 485 topics
  • 13. Programmability • Both CUDA and OpenCL are low-level – Time consuming kernel development – Data-parallel algorithm design • Focus on programmability interfaces
  • 14. Programmability Faster SSE or Slower AVX Time-consuming Easy-to-use
  • 15. Programmability Writing Faster Kernels SSE or Slower AVX Time-consuming Easy-to-use
  • 16. Programmability Writing Faster Kernels SSE or Compiler Slower AVX Directives Time-consuming Easy-to-use
  • 17. Programmability Writing Using Faster Kernels Libraries SSE or Compiler Slower AVX Directives Time-consuming Easy-to-use
  • 18. Libraries Make the Difference Raw math libraries in NVIDIA CUDA • CUBLAS, CUFFT, CULA, Magma – Provides all BLAS, LAPACK, and FFT routines necessary for most dense matrix operations • CUSPARSE – A good start for sparse linear algebra
  • 19. Libraries Make the Difference Raw math libraries in AMD OpenCL • clAmdBlas, clAmdFft – Provides most important Blas routines – Provides radix 2, 3, and 5 FFT routines – No LAPACK support – No sparse data support
  • 20. Libraries Make the Difference Raw math libraries in AMD OpenCL • clAmdBlas, clAmdFft – Provides most important Blas routines – Provides radix 2, 3, and 5 FFT routines – No LAPACK support – No sparse data support – Runs on any OpenCL-compliant device
  • 21. ArrayFire Code and Benchmarks