SlideShare a Scribd company logo
1 of 15
Download to read offline
OpenCL for RISC-V
Shao-Chung Wang
RISC-V Summit, Dec. 8,2020
Agenda
OpenCL Introduction
1
OpenCL Extension for RVV Cores
2
OpenCL Framework for RISC-V
3
Status
4
Taking RISC-V® Mainstream 3
Open Computing Language
• A popular framework for writing programs that execute across
heterogeneous platforms consisting of CPUs, GPUs, DSPs,
FPGAs or hardware accelerators with a host and multiple
devices
• Examples of host  devices pairs
– x86  multiple Andes NX27V (vector processors)
– Andes AX45MP  multiple Andes NX27V
– Andes AX45MP  multiple HW accelerators
Taking RISC-V® Mainstream 4
Open Computing Language
• OpenCL Runtime
– Platform Layer API
• Query, select, and initialize compute devices
– Runtime API
• Build and dispatch kernel programs
• Resource management
• OpenCL kernel language
– OpenCL C - subset of C99 but with language extensions
– A set of built-in functions
Taking RISC-V® Mainstream 5
Application
OpenCL Framework Overview
• OpenCL programs include host and kernel code fragment
– Host program is run on CPU
– Kernels can be run on both CPU and hardware accelerators
Host
Program
OpenCL
Kernels
OpenCL
Kernels
OpenCL
Kernels
OpenCL Runtime
Platform API
Runtime API
OpenCL Compiler
Frontend
Backend
CPU
Hardware
Accelerator
Hardware
Accelerator
Taking RISC-V® Mainstream 6
Example: Vector Addition
• A simple C program uses
a loop to add two vectors.
• The “hello world” program
to demonstrate the data
parallel programming
void vadd(float *a, float *b, float *c, int n)
{
int i;
for (i=0; i < n; i++) {
a[i] = b[i] + c[i];
}
return;
}
int main()
{
int a[100], b[100], c[100];
vadd(a, b, c, 100);
return 0;
}
Taking RISC-V® Mainstream 7
• For vector addition, the following defines the problem domain
– To process two arrays with 100 elements
• 1 kernel instance executes the addition for one array element
• 100 total kernel instances are executed
Expressing Data Parallelism in OpenCL
Work Item – smallest parallel execution unit
• Define a problem domain to execute the kernel
Work Group - A set of work items
• The work items in the same group can be
synchronized
Taking RISC-V® Mainstream 8
Vector Addition (Kernel)
__kernel void vadd (__global const float *a,
__global const float *b,
__global float *c)
{
int gid = get_global_id(0) ;
c[gid] = a[gid] + b[gid];
}
Function qualifier to identify
the function is kernel
Address space qualifier,
__private, __local, __global, or__constant,
to annotate the data locations
Built-in function returns the unique id for
work item
Taking RISC-V® Mainstream 9
Vector Addition (Host Program)
Execute the kernel
Read result from the device
Setup kernel
Build kernel (or load binary)
Allocate memory buffer
Set the platforms and queues
int main () {
……
clGetContextInfo(context, CL_CONTEXT_DEVICES, 0, NULL, &cb);
clGetContextInfo(context, CL_CONTEXT_DEVICES, cb, devices, NULL);
cmd_queue = clCreateCommandQueue(context, devices[0], 0, NULL);
……
memobjs[0] = clCreateBuffer(context, CL_MEM_COPY_HOST_PTR,…);
……
program = clCreateProgramWithSource(context, 1, &program_source, …);
clBuildProgram(program, 0, NULL, NULL, NULL, NULL);
kernel = clCreateKernel(program, “vadd”, NULL);
clSetKernelArg(kernel, 0, (void *) &memobjs[0], sizeof(cl_mem));
……
global_work_size[0] = n;
clEnqueueNDRangeKernel(cmd_queue, kernel, 1, NULL, global_work_size, …);
clEnqueueReadBuffer(context, memobjs[2], CL_TRUE, 0, …);
……
}
Taking RISC-V® Mainstream 10
An Example OpenCL Platform
• Host: x86
• Devices: 32 NX27V cores with RVV support
• Each core runs one or more work items at one time
• Host runtime dispatches work groups to multiple cores for parallel
execution
Host Device
Device Memory
NX27V
Local
MEM
NX27V
Local
MEM
NX27V
Local
MEM
……
……
x86
Cache
……
Host Memory
Taking RISC-V® Mainstream 11
OpenCL C Extension for RVV
• Support RVV intrinsic and new built-in functions
__kernel
void vadd_rvv_cl(__global float *a,
__global float *b,
__global float *c
int n)
{
//return the index of the first element
//to be executed by a workitem
int wi = get_work_id(sizeof(float),n,0);
vfloat32m1_t vb = vle32_v_f32m1(&b[wi]);
vfloat32m1_t vc = vle32_v_f32m1(&c[wi]);
vflaot32m1_t va = vadd_vv_f32m1(vb, vc);
vse32_v_f32m1(&a[wi], va);
}
void vadd_rvv(float *a, float *b, float *c,
int n)
{
int tn = n;
while (tn > 0) {
size_t vl = vsetvl_e32m1(tn);
vfloat32m1_t vb = vle32_v_f32m1(b);
vfloat32m1_t vc = vle32_v_f32m1(c);
vflaot32m1_t va = vadd_vv_f32m1(vb, vc);
vse32_v_f32m1(a, va);
a += vl; b += vl;
c += vl; tn -= vl;
}
}
C with RVV Intrinsic OpenCL Kernel with RVV Intrinsic
Taking RISC-V® Mainstream 12
OpenCL Runtime Support
• Runtime is composed of host and device layer
– Host layer is portable to different targets
– Device layer is designed to porting for different platforms
• Device query scheme for OpenCL platform layer
• Kernel launching scheme for OpenCL runtime
Host (x86)
Device Layer
Devices (AndeSim)
Device Memory
NX27V
Local
MEM
NX27V
Local
MEM
NX27V
Local
MEM
……
……
Device
Query
Kernel
Launching
GDB
Host Layer
Taking RISC-V® Mainstream 13
OpenCL Compilation Flow
• OpenCL Clang translates the OpenCL kernel into SPIR
– SPIR is an intermediate language for parallel computation defined by
Khronos
– SPIR is based on LLVM IR
• Translate SPIR to LLVM IR
– IR must be compatible to RISCV ABI
Kernel
Function(.cl)
OpenCL C
Frontend (Clang)
SPIR
Work Item
Grouping
SPIR to LLVM
IR (RISC-V ABI)
RISC-V
Codegen
LLVM
IR
LLVM
IR
Target
Binary
Taking RISC-V® Mainstream 14
Status
• Platforms
• QEMU (host and device are both Andes RISCV core)
• x86 + AndeSim (NX27V)
– Target: RV64GVC
• OpenCL Conformance Tests (CTS)
– Qemu: Most cases are passed
• Issues to be clarified with upstream
– x86 + AndeSim: ongoing
• RVV Intrinsic examples for optimization targets
• Next: optimizations on RVV compilation and host framework
Andes open cl for RISC-V

More Related Content

What's hot

Static partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-VStatic partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-VRISC-V International
 
Library Operating System for Linux #netdev01
Library Operating System for Linux #netdev01Library Operating System for Linux #netdev01
Library Operating System for Linux #netdev01Hajime Tazaki
 
The Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitchThe Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitchTe-Yen Liu
 
Virtualized network with openvswitch
Virtualized network with openvswitchVirtualized network with openvswitch
Virtualized network with openvswitchSim Janghoon
 
How Linux Processes Your Network Packet - Elazar Leibovich
How Linux Processes Your Network Packet - Elazar LeibovichHow Linux Processes Your Network Packet - Elazar Leibovich
How Linux Processes Your Network Packet - Elazar LeibovichDevOpsDays Tel Aviv
 
Open vSwitch 패킷 처리 구조
Open vSwitch 패킷 처리 구조Open vSwitch 패킷 처리 구조
Open vSwitch 패킷 처리 구조Seung-Hoon Baek
 
AMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop ProductsAMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop ProductsAMD
 
[OpenStack 하반기 스터디] Interoperability with ML2: LinuxBridge, OVS and SDN
[OpenStack 하반기 스터디] Interoperability with ML2: LinuxBridge, OVS and SDN[OpenStack 하반기 스터디] Interoperability with ML2: LinuxBridge, OVS and SDN
[OpenStack 하반기 스터디] Interoperability with ML2: LinuxBridge, OVS and SDNOpenStack Korea Community
 
Open stack networking vlan, gre
Open stack networking   vlan, greOpen stack networking   vlan, gre
Open stack networking vlan, greSim Janghoon
 
Introduction to eBPF and XDP
Introduction to eBPF and XDPIntroduction to eBPF and XDP
Introduction to eBPF and XDPlcplcp1
 
Project ACRN expose and pass through platform hidden PCIe devices to SOS
Project ACRN expose and pass through platform hidden PCIe devices to SOSProject ACRN expose and pass through platform hidden PCIe devices to SOS
Project ACRN expose and pass through platform hidden PCIe devices to SOSProject ACRN
 
Cadence P-cell tutorial
Cadence P-cell tutorial Cadence P-cell tutorial
Cadence P-cell tutorial Michael Lee
 

What's hot (20)

Static partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-VStatic partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-V
 
Dpdk performance
Dpdk performanceDpdk performance
Dpdk performance
 
Qemu Introduction
Qemu IntroductionQemu Introduction
Qemu Introduction
 
Library Operating System for Linux #netdev01
Library Operating System for Linux #netdev01Library Operating System for Linux #netdev01
Library Operating System for Linux #netdev01
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
The Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitchThe Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitch
 
Virtualized network with openvswitch
Virtualized network with openvswitchVirtualized network with openvswitch
Virtualized network with openvswitch
 
Understanding DPDK
Understanding DPDKUnderstanding DPDK
Understanding DPDK
 
How Linux Processes Your Network Packet - Elazar Leibovich
How Linux Processes Your Network Packet - Elazar LeibovichHow Linux Processes Your Network Packet - Elazar Leibovich
How Linux Processes Your Network Packet - Elazar Leibovich
 
Open vSwitch 패킷 처리 구조
Open vSwitch 패킷 처리 구조Open vSwitch 패킷 처리 구조
Open vSwitch 패킷 처리 구조
 
AMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop ProductsAMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop Products
 
DPDK In Depth
DPDK In DepthDPDK In Depth
DPDK In Depth
 
[OpenStack 하반기 스터디] Interoperability with ML2: LinuxBridge, OVS and SDN
[OpenStack 하반기 스터디] Interoperability with ML2: LinuxBridge, OVS and SDN[OpenStack 하반기 스터디] Interoperability with ML2: LinuxBridge, OVS and SDN
[OpenStack 하반기 스터디] Interoperability with ML2: LinuxBridge, OVS and SDN
 
Linux Device Tree
Linux Device TreeLinux Device Tree
Linux Device Tree
 
Open stack networking vlan, gre
Open stack networking   vlan, greOpen stack networking   vlan, gre
Open stack networking vlan, gre
 
Introduction to eBPF and XDP
Introduction to eBPF and XDPIntroduction to eBPF and XDP
Introduction to eBPF and XDP
 
Project ACRN expose and pass through platform hidden PCIe devices to SOS
Project ACRN expose and pass through platform hidden PCIe devices to SOSProject ACRN expose and pass through platform hidden PCIe devices to SOS
Project ACRN expose and pass through platform hidden PCIe devices to SOS
 
eBPF maps 101
eBPF maps 101eBPF maps 101
eBPF maps 101
 
QNX Software Systems
QNX Software SystemsQNX Software Systems
QNX Software Systems
 
Cadence P-cell tutorial
Cadence P-cell tutorial Cadence P-cell tutorial
Cadence P-cell tutorial
 

Similar to Andes open cl for RISC-V

MattsonTutorialSC14.pptx
MattsonTutorialSC14.pptxMattsonTutorialSC14.pptx
MattsonTutorialSC14.pptxgopikahari7
 
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...David Walker
 
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...mouhouioui
 
Tema3_Introduction_to_CUDA_C.pdf
Tema3_Introduction_to_CUDA_C.pdfTema3_Introduction_to_CUDA_C.pdf
Tema3_Introduction_to_CUDA_C.pdfpepe464163
 
Building High-Performance Language Implementations With Low Effort
Building High-Performance Language Implementations With Low EffortBuilding High-Performance Language Implementations With Low Effort
Building High-Performance Language Implementations With Low EffortStefan Marr
 
Cross Platform App Development with C++
Cross Platform App Development with C++Cross Platform App Development with C++
Cross Platform App Development with C++Joan Puig Sanz
 
Дмитрий Вовк: Векторизация кода под мобильные платформы
Дмитрий Вовк: Векторизация кода под мобильные платформыДмитрий Вовк: Векторизация кода под мобильные платформы
Дмитрий Вовк: Векторизация кода под мобильные платформыDevGAMM Conference
 
Optimizing NN inference performance on Arm NEON and Vulkan
Optimizing NN inference performance on Arm NEON and VulkanOptimizing NN inference performance on Arm NEON and Vulkan
Optimizing NN inference performance on Arm NEON and Vulkanax inc.
 
Android RenderScript
Android RenderScriptAndroid RenderScript
Android RenderScriptJungsoo Nam
 
PVS-Studio, a solution for resource intensive applications development
PVS-Studio, a solution for resource intensive applications developmentPVS-Studio, a solution for resource intensive applications development
PVS-Studio, a solution for resource intensive applications developmentOOO "Program Verification Systems"
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA Japan
 
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015Windows Developer
 
How to Connect SystemVerilog with Octave
How to Connect SystemVerilog with OctaveHow to Connect SystemVerilog with Octave
How to Connect SystemVerilog with OctaveAmiq Consulting
 
Track c-High speed transaction-based hw-sw coverification -eve
Track c-High speed transaction-based hw-sw coverification -eveTrack c-High speed transaction-based hw-sw coverification -eve
Track c-High speed transaction-based hw-sw coverification -evechiportal
 
KDD 2016 Streaming Analytics Tutorial
KDD 2016 Streaming Analytics TutorialKDD 2016 Streaming Analytics Tutorial
KDD 2016 Streaming Analytics TutorialNeera Agarwal
 

Similar to Andes open cl for RISC-V (20)

MattsonTutorialSC14.pptx
MattsonTutorialSC14.pptxMattsonTutorialSC14.pptx
MattsonTutorialSC14.pptx
 
MattsonTutorialSC14.pdf
MattsonTutorialSC14.pdfMattsonTutorialSC14.pdf
MattsonTutorialSC14.pdf
 
SDAccel Design Contest: Xilinx SDAccel
SDAccel Design Contest: Xilinx SDAccel SDAccel Design Contest: Xilinx SDAccel
SDAccel Design Contest: Xilinx SDAccel
 
Android ndk
Android ndkAndroid ndk
Android ndk
 
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
 
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
 
Tema3_Introduction_to_CUDA_C.pdf
Tema3_Introduction_to_CUDA_C.pdfTema3_Introduction_to_CUDA_C.pdf
Tema3_Introduction_to_CUDA_C.pdf
 
Building High-Performance Language Implementations With Low Effort
Building High-Performance Language Implementations With Low EffortBuilding High-Performance Language Implementations With Low Effort
Building High-Performance Language Implementations With Low Effort
 
Cross Platform App Development with C++
Cross Platform App Development with C++Cross Platform App Development with C++
Cross Platform App Development with C++
 
Lecture 04
Lecture 04Lecture 04
Lecture 04
 
Дмитрий Вовк: Векторизация кода под мобильные платформы
Дмитрий Вовк: Векторизация кода под мобильные платформыДмитрий Вовк: Векторизация кода под мобильные платформы
Дмитрий Вовк: Векторизация кода под мобильные платформы
 
Optimizing NN inference performance on Arm NEON and Vulkan
Optimizing NN inference performance on Arm NEON and VulkanOptimizing NN inference performance on Arm NEON and Vulkan
Optimizing NN inference performance on Arm NEON and Vulkan
 
Android RenderScript
Android RenderScriptAndroid RenderScript
Android RenderScript
 
PVS-Studio, a solution for resource intensive applications development
PVS-Studio, a solution for resource intensive applications developmentPVS-Studio, a solution for resource intensive applications development
PVS-Studio, a solution for resource intensive applications development
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
 
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
 
NodeJS for Beginner
NodeJS for BeginnerNodeJS for Beginner
NodeJS for Beginner
 
How to Connect SystemVerilog with Octave
How to Connect SystemVerilog with OctaveHow to Connect SystemVerilog with Octave
How to Connect SystemVerilog with Octave
 
Track c-High speed transaction-based hw-sw coverification -eve
Track c-High speed transaction-based hw-sw coverification -eveTrack c-High speed transaction-based hw-sw coverification -eve
Track c-High speed transaction-based hw-sw coverification -eve
 
KDD 2016 Streaming Analytics Tutorial
KDD 2016 Streaming Analytics TutorialKDD 2016 Streaming Analytics Tutorial
KDD 2016 Streaming Analytics Tutorial
 

More from RISC-V International

London Open Source Meetup for RISC-V
London Open Source Meetup for RISC-VLondon Open Source Meetup for RISC-V
London Open Source Meetup for RISC-VRISC-V International
 
Ziptillion boosting RISC-V with an efficient and os transparent memory comp...
Ziptillion   boosting RISC-V with an efficient and os transparent memory comp...Ziptillion   boosting RISC-V with an efficient and os transparent memory comp...
Ziptillion boosting RISC-V with an efficient and os transparent memory comp...RISC-V International
 
Standardizing the tee with global platform and RISC-V
Standardizing the tee with global platform and RISC-VStandardizing the tee with global platform and RISC-V
Standardizing the tee with global platform and RISC-VRISC-V International
 
Semi dynamics high bandwidth vector capable RISC-V cores
Semi dynamics high bandwidth vector capable RISC-V coresSemi dynamics high bandwidth vector capable RISC-V cores
Semi dynamics high bandwidth vector capable RISC-V coresRISC-V International
 
Reverse Engineering of Rocket Chip
Reverse Engineering of Rocket ChipReverse Engineering of Rocket Chip
Reverse Engineering of Rocket ChipRISC-V International
 
RISC-V NOEL-V - A new high performance RISC-V Processor Family
RISC-V NOEL-V - A new high performance RISC-V Processor FamilyRISC-V NOEL-V - A new high performance RISC-V Processor Family
RISC-V NOEL-V - A new high performance RISC-V Processor FamilyRISC-V International
 
RISC-V 30910 kassem_ summit 2020 - so_c_gen
RISC-V 30910 kassem_ summit 2020 - so_c_genRISC-V 30910 kassem_ summit 2020 - so_c_gen
RISC-V 30910 kassem_ summit 2020 - so_c_genRISC-V International
 
RISC-V 30907 summit 2020 joint picocom_mentor
RISC-V 30907 summit 2020 joint picocom_mentorRISC-V 30907 summit 2020 joint picocom_mentor
RISC-V 30907 summit 2020 joint picocom_mentorRISC-V International
 
RISC-V 30906 hex five multi_zone iot firmware
RISC-V 30906 hex five multi_zone iot firmwareRISC-V 30906 hex five multi_zone iot firmware
RISC-V 30906 hex five multi_zone iot firmwareRISC-V International
 
RISC-V 30946 manuel_offenberg_v3_notes
RISC-V 30946 manuel_offenberg_v3_notesRISC-V 30946 manuel_offenberg_v3_notes
RISC-V 30946 manuel_offenberg_v3_notesRISC-V International
 
RISC-V software state of the union
RISC-V software state of the unionRISC-V software state of the union
RISC-V software state of the unionRISC-V International
 
Ripes tracking computer architecture throught visual and interactive simula...
Ripes   tracking computer architecture throught visual and interactive simula...Ripes   tracking computer architecture throught visual and interactive simula...
Ripes tracking computer architecture throught visual and interactive simula...RISC-V International
 

More from RISC-V International (20)

WD RISC-V inliner work effort
WD RISC-V inliner work effortWD RISC-V inliner work effort
WD RISC-V inliner work effort
 
RISC-V Zce Extension
RISC-V Zce ExtensionRISC-V Zce Extension
RISC-V Zce Extension
 
RISC-V Online Tutor
RISC-V Online TutorRISC-V Online Tutor
RISC-V Online Tutor
 
London Open Source Meetup for RISC-V
London Open Source Meetup for RISC-VLondon Open Source Meetup for RISC-V
London Open Source Meetup for RISC-V
 
RISC-V Introduction
RISC-V IntroductionRISC-V Introduction
RISC-V Introduction
 
Ziptillion boosting RISC-V with an efficient and os transparent memory comp...
Ziptillion   boosting RISC-V with an efficient and os transparent memory comp...Ziptillion   boosting RISC-V with an efficient and os transparent memory comp...
Ziptillion boosting RISC-V with an efficient and os transparent memory comp...
 
Standardizing the tee with global platform and RISC-V
Standardizing the tee with global platform and RISC-VStandardizing the tee with global platform and RISC-V
Standardizing the tee with global platform and RISC-V
 
Semi dynamics high bandwidth vector capable RISC-V cores
Semi dynamics high bandwidth vector capable RISC-V coresSemi dynamics high bandwidth vector capable RISC-V cores
Semi dynamics high bandwidth vector capable RISC-V cores
 
Security and functional safety
Security and functional safetySecurity and functional safety
Security and functional safety
 
Reverse Engineering of Rocket Chip
Reverse Engineering of Rocket ChipReverse Engineering of Rocket Chip
Reverse Engineering of Rocket Chip
 
RISC-V NOEL-V - A new high performance RISC-V Processor Family
RISC-V NOEL-V - A new high performance RISC-V Processor FamilyRISC-V NOEL-V - A new high performance RISC-V Processor Family
RISC-V NOEL-V - A new high performance RISC-V Processor Family
 
RISC-V 30910 kassem_ summit 2020 - so_c_gen
RISC-V 30910 kassem_ summit 2020 - so_c_genRISC-V 30910 kassem_ summit 2020 - so_c_gen
RISC-V 30910 kassem_ summit 2020 - so_c_gen
 
RISC-V 30908 patra
RISC-V 30908 patraRISC-V 30908 patra
RISC-V 30908 patra
 
RISC-V 30907 summit 2020 joint picocom_mentor
RISC-V 30907 summit 2020 joint picocom_mentorRISC-V 30907 summit 2020 joint picocom_mentor
RISC-V 30907 summit 2020 joint picocom_mentor
 
RISC-V 30906 hex five multi_zone iot firmware
RISC-V 30906 hex five multi_zone iot firmwareRISC-V 30906 hex five multi_zone iot firmware
RISC-V 30906 hex five multi_zone iot firmware
 
RISC-V 30946 manuel_offenberg_v3_notes
RISC-V 30946 manuel_offenberg_v3_notesRISC-V 30946 manuel_offenberg_v3_notes
RISC-V 30946 manuel_offenberg_v3_notes
 
RISC-V software state of the union
RISC-V software state of the unionRISC-V software state of the union
RISC-V software state of the union
 
Ripes tracking computer architecture throught visual and interactive simula...
Ripes   tracking computer architecture throught visual and interactive simula...Ripes   tracking computer architecture throught visual and interactive simula...
Ripes tracking computer architecture throught visual and interactive simula...
 
Porting tock to open titan
Porting tock to open titanPorting tock to open titan
Porting tock to open titan
 
Open j9 jdk on RISC-V
Open j9 jdk on RISC-VOpen j9 jdk on RISC-V
Open j9 jdk on RISC-V
 

Recently uploaded

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 

Recently uploaded (20)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 

Andes open cl for RISC-V

  • 1. OpenCL for RISC-V Shao-Chung Wang RISC-V Summit, Dec. 8,2020
  • 2. Agenda OpenCL Introduction 1 OpenCL Extension for RVV Cores 2 OpenCL Framework for RISC-V 3 Status 4
  • 3. Taking RISC-V® Mainstream 3 Open Computing Language • A popular framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, DSPs, FPGAs or hardware accelerators with a host and multiple devices • Examples of host  devices pairs – x86  multiple Andes NX27V (vector processors) – Andes AX45MP  multiple Andes NX27V – Andes AX45MP  multiple HW accelerators
  • 4. Taking RISC-V® Mainstream 4 Open Computing Language • OpenCL Runtime – Platform Layer API • Query, select, and initialize compute devices – Runtime API • Build and dispatch kernel programs • Resource management • OpenCL kernel language – OpenCL C - subset of C99 but with language extensions – A set of built-in functions
  • 5. Taking RISC-V® Mainstream 5 Application OpenCL Framework Overview • OpenCL programs include host and kernel code fragment – Host program is run on CPU – Kernels can be run on both CPU and hardware accelerators Host Program OpenCL Kernels OpenCL Kernels OpenCL Kernels OpenCL Runtime Platform API Runtime API OpenCL Compiler Frontend Backend CPU Hardware Accelerator Hardware Accelerator
  • 6. Taking RISC-V® Mainstream 6 Example: Vector Addition • A simple C program uses a loop to add two vectors. • The “hello world” program to demonstrate the data parallel programming void vadd(float *a, float *b, float *c, int n) { int i; for (i=0; i < n; i++) { a[i] = b[i] + c[i]; } return; } int main() { int a[100], b[100], c[100]; vadd(a, b, c, 100); return 0; }
  • 7. Taking RISC-V® Mainstream 7 • For vector addition, the following defines the problem domain – To process two arrays with 100 elements • 1 kernel instance executes the addition for one array element • 100 total kernel instances are executed Expressing Data Parallelism in OpenCL Work Item – smallest parallel execution unit • Define a problem domain to execute the kernel Work Group - A set of work items • The work items in the same group can be synchronized
  • 8. Taking RISC-V® Mainstream 8 Vector Addition (Kernel) __kernel void vadd (__global const float *a, __global const float *b, __global float *c) { int gid = get_global_id(0) ; c[gid] = a[gid] + b[gid]; } Function qualifier to identify the function is kernel Address space qualifier, __private, __local, __global, or__constant, to annotate the data locations Built-in function returns the unique id for work item
  • 9. Taking RISC-V® Mainstream 9 Vector Addition (Host Program) Execute the kernel Read result from the device Setup kernel Build kernel (or load binary) Allocate memory buffer Set the platforms and queues int main () { …… clGetContextInfo(context, CL_CONTEXT_DEVICES, 0, NULL, &cb); clGetContextInfo(context, CL_CONTEXT_DEVICES, cb, devices, NULL); cmd_queue = clCreateCommandQueue(context, devices[0], 0, NULL); …… memobjs[0] = clCreateBuffer(context, CL_MEM_COPY_HOST_PTR,…); …… program = clCreateProgramWithSource(context, 1, &program_source, …); clBuildProgram(program, 0, NULL, NULL, NULL, NULL); kernel = clCreateKernel(program, “vadd”, NULL); clSetKernelArg(kernel, 0, (void *) &memobjs[0], sizeof(cl_mem)); …… global_work_size[0] = n; clEnqueueNDRangeKernel(cmd_queue, kernel, 1, NULL, global_work_size, …); clEnqueueReadBuffer(context, memobjs[2], CL_TRUE, 0, …); …… }
  • 10. Taking RISC-V® Mainstream 10 An Example OpenCL Platform • Host: x86 • Devices: 32 NX27V cores with RVV support • Each core runs one or more work items at one time • Host runtime dispatches work groups to multiple cores for parallel execution Host Device Device Memory NX27V Local MEM NX27V Local MEM NX27V Local MEM …… …… x86 Cache …… Host Memory
  • 11. Taking RISC-V® Mainstream 11 OpenCL C Extension for RVV • Support RVV intrinsic and new built-in functions __kernel void vadd_rvv_cl(__global float *a, __global float *b, __global float *c int n) { //return the index of the first element //to be executed by a workitem int wi = get_work_id(sizeof(float),n,0); vfloat32m1_t vb = vle32_v_f32m1(&b[wi]); vfloat32m1_t vc = vle32_v_f32m1(&c[wi]); vflaot32m1_t va = vadd_vv_f32m1(vb, vc); vse32_v_f32m1(&a[wi], va); } void vadd_rvv(float *a, float *b, float *c, int n) { int tn = n; while (tn > 0) { size_t vl = vsetvl_e32m1(tn); vfloat32m1_t vb = vle32_v_f32m1(b); vfloat32m1_t vc = vle32_v_f32m1(c); vflaot32m1_t va = vadd_vv_f32m1(vb, vc); vse32_v_f32m1(a, va); a += vl; b += vl; c += vl; tn -= vl; } } C with RVV Intrinsic OpenCL Kernel with RVV Intrinsic
  • 12. Taking RISC-V® Mainstream 12 OpenCL Runtime Support • Runtime is composed of host and device layer – Host layer is portable to different targets – Device layer is designed to porting for different platforms • Device query scheme for OpenCL platform layer • Kernel launching scheme for OpenCL runtime Host (x86) Device Layer Devices (AndeSim) Device Memory NX27V Local MEM NX27V Local MEM NX27V Local MEM …… …… Device Query Kernel Launching GDB Host Layer
  • 13. Taking RISC-V® Mainstream 13 OpenCL Compilation Flow • OpenCL Clang translates the OpenCL kernel into SPIR – SPIR is an intermediate language for parallel computation defined by Khronos – SPIR is based on LLVM IR • Translate SPIR to LLVM IR – IR must be compatible to RISCV ABI Kernel Function(.cl) OpenCL C Frontend (Clang) SPIR Work Item Grouping SPIR to LLVM IR (RISC-V ABI) RISC-V Codegen LLVM IR LLVM IR Target Binary
  • 14. Taking RISC-V® Mainstream 14 Status • Platforms • QEMU (host and device are both Andes RISCV core) • x86 + AndeSim (NX27V) – Target: RV64GVC • OpenCL Conformance Tests (CTS) – Qemu: Most cases are passed • Issues to be clarified with upstream – x86 + AndeSim: ongoing • RVV Intrinsic examples for optimization targets • Next: optimizations on RVV compilation and host framework