SlideShare a Scribd company logo
1 of 13
Download to read offline
INTRODUCTION
TO OPENCL
Unai Lopez

Intelligent Systems Group

Department of Computer Architecture & Technology

University of the Basque Country
Outline
1)  Introduction

2)  Programming Basics


3)  “Hello World”


4)  Final remarks
OpenCL
•  Standard for the development of data parallel applications

•  Most used for the development of GPGPU applications:
 General Purpose computing on Graphics Processing Units

•  A GPU is comprised of hundreds of compute cores




     nVidia GTX 285 (240 Compute cores)   nVidia GT200b Architecture

•  Specialized for massively data parallel computation
OpenCL & GPGPU
•  GPGPU: Take advantage of GPU’s computing power to
 make massively parallel applications

•  Parallel applications with huge acceleration in Molecular
 Dynamics, Image Processing, Evolutionary Computation,…

•  All cases based on data parallelism:
 each thread processes a subset of the data

•  For example, a vector addition:

        A	

        +	

        B	

        ||	

        C	

   Thread ID    0   1   2   3   4   5   6   7   8   9   10   11
OpenCL
•  Furthermore, OpenCL provides portability:
  same code can run on different architectures

•  For example:




Intel Core i5 CPU      STICell B/E        Intel Xeon Phi    AMD HD 6950 GPU
4 cores @ 2’5 Ghz   8 cores @ 3,2 Ghz   50 cores @ 1 Ghz   1408 cores @ 800 Mhz
OpenCL
•  Provides the following abstraction:
 A compute device is composed by compute units




•  OpenCL platform: Host + Compute Devices

•  Each manufacturer provides an SDK:
   •  NVIDIA SDK for GPUs
   •  AMD APP for CPUs/GPU
   •  Intel for CPUs
   •  IBM for PowerPC and Cell B/E
Programming Basics
•  Kernel: function that defines the behavior of each thread


•  For example, kernel for vector addition:
  __kernel void sumKernel (
  __global int* a, __global int* b, __global int* c)
  {
     int i = get_global_id(0);
     c[i] = a[i] + b[i];
  }


•  Written in OpenCL-C: ANSI-C + Set of kernel functions, e.g.:
   •  get_global_id: obtains thread index
   •  barrier: synchronizes threads
Programming Basics
•  An OpenCL applications consists of:



 Kernel file (OpenCL-C): problem computation   Host code(C): kernel management


•  Basic host application flow:
   1.  Load and Compilation of kernel
   2.  Data copy from host to device (e.g. from CPU to GPU)
   3.  Execution of kernel
   4.  Data copy from device to host
   5.  Release kernels and data from device memory
•  Execution using command queue in each device
Programming Basics
•  Host code: programmed using OpenCL API

•  API Calls, such as:
   •  clCreateProgramWithSource: Load kernel from char*
   •  clBuildProgram: Compile kernel
   •  clSetKernelArgs: Set kernel arguments for the device
   •  clEnqueueWriteBuffer/clEnqueueRead: Copy data vector to device
   •  clEnqueueNDRangerKernel: Launch kernel in device


•  API Types, such as:
   •  cl_mem: Pointer to device memory objects
   •  cl_program: Kernel object
   •  cl_float / cl_int / cl_uint: Redefinition of C types
Hello World

•  Implementation of simple vector addition in OpenCL


•  Checks for default platform and device in the system


•  Modify Makefile with proper paths in each system


•  Run: vectorAdd <size_of_vector>
Final Remarks
•  OpenCL does not provide performance portability

•  Alternative to NVIDIA CUDA:
 Programming paradigm for NVIDIA GPU cards

•  Combinable with other parallel programming models:
   •  OpenMP for SMPs / MPI for MPPs


•  Huge ecosystems for OpenCL, e.g. OpenACC:
 Develop GPGPU applications using directives
           #pragma acc kernels
           for(i = 0; i< N; i++)
              c[i] = b[i] + a[i];
More about OpenCL
•  Before starting to develop take a look at:
   •  Context, command queues, events,…
•  Documentation
   •  Khronos Group: Maintainers of OpenCL
   •  OpenCL Best practices guide in CUDA/AMD SDKs
   •  Programming Massively Parallel Processors (Book for CUDA)


•  OpenCL sample applications:
   •  Most SDKs include example OpenCL applications
   •  Rodinia: http://lava.cs.virginia.edu/wiki/rodinia
   •  Parboil: http://impact.crhc.illinois.edu/parboil.aspx
INTRODUCTION
TO OPENCL
Unai Lopez – ulopez009@ehu.es

Intelligent Systems Group

Department of Computer Architecture & Technology

University of the Basque Country

More Related Content

What's hot

Parallel Programming
Parallel ProgrammingParallel Programming
Parallel ProgrammingUday Sharma
 
virtualization and hypervisors
virtualization and hypervisorsvirtualization and hypervisors
virtualization and hypervisorsGaurav Suri
 
Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDARaymond Tay
 
Virtualization Architecture & KVM
Virtualization Architecture & KVMVirtualization Architecture & KVM
Virtualization Architecture & KVMPradeep Kumar
 
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVM
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVMHypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVM
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVMvwchu
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel ComputingAkhila Prabhakaran
 
Enable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunEnable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunheut2008
 
Introduction to Shell script
Introduction to Shell scriptIntroduction to Shell script
Introduction to Shell scriptBhavesh Padharia
 
Lecture 1 introduction to parallel and distributed computing
Lecture 1   introduction to parallel and distributed computingLecture 1   introduction to parallel and distributed computing
Lecture 1 introduction to parallel and distributed computingVajira Thambawita
 
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, XilinxXPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, XilinxThe Linux Foundation
 
Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture Haris456
 

What's hot (20)

Parallel Programming
Parallel ProgrammingParallel Programming
Parallel Programming
 
virtualization and hypervisors
virtualization and hypervisorsvirtualization and hypervisors
virtualization and hypervisors
 
CUDA Architecture
CUDA ArchitectureCUDA Architecture
CUDA Architecture
 
Virtualization.ppt
Virtualization.pptVirtualization.ppt
Virtualization.ppt
 
Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDA
 
Linux device drivers
Linux device drivers Linux device drivers
Linux device drivers
 
Virtualization Architecture & KVM
Virtualization Architecture & KVMVirtualization Architecture & KVM
Virtualization Architecture & KVM
 
Cuda
CudaCuda
Cuda
 
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVM
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVMHypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVM
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVM
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
 
Enable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunEnable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zun
 
Introduction to Shell script
Introduction to Shell scriptIntroduction to Shell script
Introduction to Shell script
 
Lecture 1 introduction to parallel and distributed computing
Lecture 1   introduction to parallel and distributed computingLecture 1   introduction to parallel and distributed computing
Lecture 1 introduction to parallel and distributed computing
 
Embedded Linux on ARM
Embedded Linux on ARMEmbedded Linux on ARM
Embedded Linux on ARM
 
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, XilinxXPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
 
Linux systems - Getting started with setting up and embedded platform
Linux systems - Getting started with setting up and embedded platformLinux systems - Getting started with setting up and embedded platform
Linux systems - Getting started with setting up and embedded platform
 
Linux Internals - Part I
Linux Internals - Part ILinux Internals - Part I
Linux Internals - Part I
 
Linux Systems: Getting started with setting up an Embedded platform
Linux Systems: Getting started with setting up an Embedded platformLinux Systems: Getting started with setting up an Embedded platform
Linux Systems: Getting started with setting up an Embedded platform
 
Linux kernel
Linux kernelLinux kernel
Linux kernel
 
Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture
 

Viewers also liked

Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
 
OpenCL applications in genomics
OpenCL applications in genomicsOpenCL applications in genomics
OpenCL applications in genomicsUSC
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesAMD Developer Central
 
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP..."Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...Edge AI and Vision Alliance
 
FPGA Architecture Presentation
FPGA Architecture PresentationFPGA Architecture Presentation
FPGA Architecture Presentationomutukuda
 
Field programable gate array
Field programable gate arrayField programable gate array
Field programable gate arrayNeha Agarwal
 
FPGAs : An Overview
FPGAs : An OverviewFPGAs : An Overview
FPGAs : An OverviewSanjiv Malik
 
Applications of paralleL processing
Applications of paralleL processingApplications of paralleL processing
Applications of paralleL processingPage Maker
 
Fundamentals of FPGA
Fundamentals of FPGAFundamentals of FPGA
Fundamentals of FPGAvelamakuri
 

Viewers also liked (13)

Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
OpenCL applications in genomics
OpenCL applications in genomicsOpenCL applications in genomics
OpenCL applications in genomics
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math Libraries
 
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP..."Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
 
FPGA Architecture Presentation
FPGA Architecture PresentationFPGA Architecture Presentation
FPGA Architecture Presentation
 
Field programable gate array
Field programable gate arrayField programable gate array
Field programable gate array
 
FPGAs : An Overview
FPGAs : An OverviewFPGAs : An Overview
FPGAs : An Overview
 
FPGA Introduction
FPGA IntroductionFPGA Introduction
FPGA Introduction
 
FPGA
FPGAFPGA
FPGA
 
What is FPGA?
What is FPGA?What is FPGA?
What is FPGA?
 
FPGA
FPGAFPGA
FPGA
 
Applications of paralleL processing
Applications of paralleL processingApplications of paralleL processing
Applications of paralleL processing
 
Fundamentals of FPGA
Fundamentals of FPGAFundamentals of FPGA
Fundamentals of FPGA
 

Similar to Introduction to OpenCL

Introduction to OpenCL By Hammad Ghulam Mustafa
Introduction to OpenCL By Hammad Ghulam MustafaIntroduction to OpenCL By Hammad Ghulam Mustafa
Introduction to OpenCL By Hammad Ghulam MustafaHAMMAD GHULAM MUSTAFA
 
MattsonTutorialSC14.pptx
MattsonTutorialSC14.pptxMattsonTutorialSC14.pptx
MattsonTutorialSC14.pptxgopikahari7
 
The Rise of Parallel Computing
The Rise of Parallel ComputingThe Rise of Parallel Computing
The Rise of Parallel Computingbakers84
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapGeorge Markomanolis
 
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by Mikael ...
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by  Mikael ...WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by  Mikael ...
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by Mikael ...AMD Developer Central
 
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...mouhouioui
 
LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLinaro
 
Open cl programming using python syntax
Open cl programming using python syntaxOpen cl programming using python syntax
Open cl programming using python syntaxcsandit
 
OpenCL programming using Python syntax
OpenCL programming using Python syntax OpenCL programming using Python syntax
OpenCL programming using Python syntax cscpconf
 
Evaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerEvaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerGeorge Markomanolis
 
Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer George Markomanolis
 
Using Deep Learning Toolkits with Kubernetes clusters
Using Deep Learning Toolkits with Kubernetes clustersUsing Deep Learning Toolkits with Kubernetes clusters
Using Deep Learning Toolkits with Kubernetes clustersJoy Qiao
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeOfer Rosenberg
 
Harnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern CoprocessorsHarnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern CoprocessorsUnai Lopez-Novoa
 
Lec 10-linux-review
Lec 10-linux-reviewLec 10-linux-review
Lec 10-linux-reviewabinaya m
 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusJakob Karalus
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdfTigabu Yaya
 
3 Open-Source-SYCL-Intel-Khronos-EVS-Workshop_May19.pdf
3 Open-Source-SYCL-Intel-Khronos-EVS-Workshop_May19.pdf3 Open-Source-SYCL-Intel-Khronos-EVS-Workshop_May19.pdf
3 Open-Source-SYCL-Intel-Khronos-EVS-Workshop_May19.pdfJunZhao68
 
LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2Linaro
 

Similar to Introduction to OpenCL (20)

Introduction to OpenCL By Hammad Ghulam Mustafa
Introduction to OpenCL By Hammad Ghulam MustafaIntroduction to OpenCL By Hammad Ghulam Mustafa
Introduction to OpenCL By Hammad Ghulam Mustafa
 
MattsonTutorialSC14.pptx
MattsonTutorialSC14.pptxMattsonTutorialSC14.pptx
MattsonTutorialSC14.pptx
 
MattsonTutorialSC14.pdf
MattsonTutorialSC14.pdfMattsonTutorialSC14.pdf
MattsonTutorialSC14.pdf
 
The Rise of Parallel Computing
The Rise of Parallel ComputingThe Rise of Parallel Computing
The Rise of Parallel Computing
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmap
 
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by Mikael ...
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by  Mikael ...WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by  Mikael ...
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by Mikael ...
 
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
 
LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience Report
 
Open cl programming using python syntax
Open cl programming using python syntaxOpen cl programming using python syntax
Open cl programming using python syntax
 
OpenCL programming using Python syntax
OpenCL programming using Python syntax OpenCL programming using Python syntax
OpenCL programming using Python syntax
 
Evaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerEvaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI Supercomputer
 
Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer
 
Using Deep Learning Toolkits with Kubernetes clusters
Using Deep Learning Toolkits with Kubernetes clustersUsing Deep Learning Toolkits with Kubernetes clusters
Using Deep Learning Toolkits with Kubernetes clusters
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universe
 
Harnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern CoprocessorsHarnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern Coprocessors
 
Lec 10-linux-review
Lec 10-linux-reviewLec 10-linux-review
Lec 10-linux-review
 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdf
 
3 Open-Source-SYCL-Intel-Khronos-EVS-Workshop_May19.pdf
3 Open-Source-SYCL-Intel-Khronos-EVS-Workshop_May19.pdf3 Open-Source-SYCL-Intel-Khronos-EVS-Workshop_May19.pdf
3 Open-Source-SYCL-Intel-Khronos-EVS-Workshop_May19.pdf
 
LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2
 

More from Unai Lopez-Novoa

Exploring performance and energy consumption differences between recent Intel...
Exploring performance and energy consumption differences between recent Intel...Exploring performance and energy consumption differences between recent Intel...
Exploring performance and energy consumption differences between recent Intel...Unai Lopez-Novoa
 
A Platform for Overcrowding Detection in Indoor Events using Scalable Technol...
A Platform for Overcrowding Detection in Indoor Events using Scalable Technol...A Platform for Overcrowding Detection in Indoor Events using Scalable Technol...
A Platform for Overcrowding Detection in Indoor Events using Scalable Technol...Unai Lopez-Novoa
 
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...Unai Lopez-Novoa
 
Introducción a la Computación Paralela
Introducción a la Computación ParalelaIntroducción a la Computación Paralela
Introducción a la Computación ParalelaUnai Lopez-Novoa
 
Computación Heterogénea: Aplicaciones y Modelado de Rendimiento
Computación Heterogénea: Aplicaciones y Modelado de RendimientoComputación Heterogénea: Aplicaciones y Modelado de Rendimiento
Computación Heterogénea: Aplicaciones y Modelado de RendimientoUnai Lopez-Novoa
 
Tolerancia a fallos en MPI con Checkpointing
Tolerancia a fallos en MPI con CheckpointingTolerancia a fallos en MPI con Checkpointing
Tolerancia a fallos en MPI con CheckpointingUnai Lopez-Novoa
 

More from Unai Lopez-Novoa (8)

Exploring performance and energy consumption differences between recent Intel...
Exploring performance and energy consumption differences between recent Intel...Exploring performance and energy consumption differences between recent Intel...
Exploring performance and energy consumption differences between recent Intel...
 
A Platform for Overcrowding Detection in Indoor Events using Scalable Technol...
A Platform for Overcrowding Detection in Indoor Events using Scalable Technol...A Platform for Overcrowding Detection in Indoor Events using Scalable Technol...
A Platform for Overcrowding Detection in Indoor Events using Scalable Technol...
 
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...
 
Introducción a la Computación Paralela
Introducción a la Computación ParalelaIntroducción a la Computación Paralela
Introducción a la Computación Paralela
 
Computación Heterogénea: Aplicaciones y Modelado de Rendimiento
Computación Heterogénea: Aplicaciones y Modelado de RendimientoComputación Heterogénea: Aplicaciones y Modelado de Rendimiento
Computación Heterogénea: Aplicaciones y Modelado de Rendimiento
 
Exploring Gpgpu Workloads
Exploring Gpgpu WorkloadsExploring Gpgpu Workloads
Exploring Gpgpu Workloads
 
Tolerancia a fallos en MPI con Checkpointing
Tolerancia a fallos en MPI con CheckpointingTolerancia a fallos en MPI con Checkpointing
Tolerancia a fallos en MPI con Checkpointing
 
Introduccion a MPI
Introduccion a MPIIntroduccion a MPI
Introduccion a MPI
 

Recently uploaded

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Introduction to OpenCL

  • 1. INTRODUCTION TO OPENCL Unai Lopez Intelligent Systems Group Department of Computer Architecture & Technology University of the Basque Country
  • 2. Outline 1)  Introduction 2)  Programming Basics 3)  “Hello World” 4)  Final remarks
  • 3. OpenCL •  Standard for the development of data parallel applications •  Most used for the development of GPGPU applications: General Purpose computing on Graphics Processing Units •  A GPU is comprised of hundreds of compute cores nVidia GTX 285 (240 Compute cores) nVidia GT200b Architecture •  Specialized for massively data parallel computation
  • 4. OpenCL & GPGPU •  GPGPU: Take advantage of GPU’s computing power to make massively parallel applications •  Parallel applications with huge acceleration in Molecular Dynamics, Image Processing, Evolutionary Computation,… •  All cases based on data parallelism: each thread processes a subset of the data •  For example, a vector addition: A + B || C Thread ID 0 1 2 3 4 5 6 7 8 9 10 11
  • 5. OpenCL •  Furthermore, OpenCL provides portability: same code can run on different architectures •  For example: Intel Core i5 CPU STICell B/E Intel Xeon Phi AMD HD 6950 GPU 4 cores @ 2’5 Ghz 8 cores @ 3,2 Ghz 50 cores @ 1 Ghz 1408 cores @ 800 Mhz
  • 6. OpenCL •  Provides the following abstraction: A compute device is composed by compute units •  OpenCL platform: Host + Compute Devices •  Each manufacturer provides an SDK: •  NVIDIA SDK for GPUs •  AMD APP for CPUs/GPU •  Intel for CPUs •  IBM for PowerPC and Cell B/E
  • 7. Programming Basics •  Kernel: function that defines the behavior of each thread •  For example, kernel for vector addition: __kernel void sumKernel ( __global int* a, __global int* b, __global int* c) { int i = get_global_id(0); c[i] = a[i] + b[i]; } •  Written in OpenCL-C: ANSI-C + Set of kernel functions, e.g.: •  get_global_id: obtains thread index •  barrier: synchronizes threads
  • 8. Programming Basics •  An OpenCL applications consists of: Kernel file (OpenCL-C): problem computation Host code(C): kernel management •  Basic host application flow: 1.  Load and Compilation of kernel 2.  Data copy from host to device (e.g. from CPU to GPU) 3.  Execution of kernel 4.  Data copy from device to host 5.  Release kernels and data from device memory •  Execution using command queue in each device
  • 9. Programming Basics •  Host code: programmed using OpenCL API •  API Calls, such as: •  clCreateProgramWithSource: Load kernel from char* •  clBuildProgram: Compile kernel •  clSetKernelArgs: Set kernel arguments for the device •  clEnqueueWriteBuffer/clEnqueueRead: Copy data vector to device •  clEnqueueNDRangerKernel: Launch kernel in device •  API Types, such as: •  cl_mem: Pointer to device memory objects •  cl_program: Kernel object •  cl_float / cl_int / cl_uint: Redefinition of C types
  • 10. Hello World •  Implementation of simple vector addition in OpenCL •  Checks for default platform and device in the system •  Modify Makefile with proper paths in each system •  Run: vectorAdd <size_of_vector>
  • 11. Final Remarks •  OpenCL does not provide performance portability •  Alternative to NVIDIA CUDA: Programming paradigm for NVIDIA GPU cards •  Combinable with other parallel programming models: •  OpenMP for SMPs / MPI for MPPs •  Huge ecosystems for OpenCL, e.g. OpenACC: Develop GPGPU applications using directives #pragma acc kernels for(i = 0; i< N; i++) c[i] = b[i] + a[i];
  • 12. More about OpenCL •  Before starting to develop take a look at: •  Context, command queues, events,… •  Documentation •  Khronos Group: Maintainers of OpenCL •  OpenCL Best practices guide in CUDA/AMD SDKs •  Programming Massively Parallel Processors (Book for CUDA) •  OpenCL sample applications: •  Most SDKs include example OpenCL applications •  Rodinia: http://lava.cs.virginia.edu/wiki/rodinia •  Parboil: http://impact.crhc.illinois.edu/parboil.aspx
  • 13. INTRODUCTION TO OPENCL Unai Lopez – ulopez009@ehu.es Intelligent Systems Group Department of Computer Architecture & Technology University of the Basque Country