SlideShare une entreprise Scribd logo
1  sur  28
CUDA
PROGRAMMING
BY
Direct parallelization & CUBLAS
C.M. WANG
Research Assistant
OUTLINE
 Preparation
 CUBLAS
 Direct Parallelization
PREPARATION
The things before your coding
PATHso your compiler knows where to find the libraries
SET UP
THE
export PATH=/usr/local/cuda-5.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-5.0/lib:/usr/local/cuda-5.0/lib64:$LD_LIBRARY_PATH
[YourAccount@John ~]$ ls –a
[YourAccount@John ~]$ vi .bash_profile
1.Open the bash profile
2.Add these lines to the file
MAKEFILE
to configure your compilation for the source code
CREATE
THE
MAIN=filename
${MAIN} .e:
nvcc ${MAIN} .cu –o ${MAIN} .e –m64 –arch sm_35 –lcublas –O3
Create a makefile something like this:
MEMORYso the GPU can actually store the data in computation
CHECK
YOUR
Global memory available on one card: 5GB.
CUBLAS
BLAS implemented on GPU via CUDA
CUBLAS
Declaration of Device Array
Allocation of Device Array
Declaration of Host Array
Copy Array : Host to Device
Copy Array : Device to Host
De-allocation of Device Array
Execution of CUBLAS
Initialization of CUBLAS
Assignment of CUDA Device
Termination of CUBLAS
#include <cuda_runtime.h>
#include <cublas_v2.h>
…
Double* M;
Double* m;
/* similar for V & v & A & a */
…
cudaSetDevice(0);
CUBLAS
Declaration of Device Array
Allocation of Device Array
Declaration of Host Array
Copy Array : Host to Device
Copy Array : Device to Host
De-allocation of Device Array
Execution of CUBLAS
Initialization of CUBLAS
Assignment of CUDA Device
Termination of CUBLAS
…
TS=sizeof(double);
size=n*n*typesize;
cudaMalloc( (void**)&m, size );
/* similar for V & v & A & a */
…
cublasStatus_t status;
cublasHandle_t handle;
status=cublasCreate(&handle);
...
CUBLAS
Declaration of Device Array
Allocation of Device Array
Declaration of Host Array
Copy Array : Host to Device
Copy Array : Device to Host
De-allocation of Device Array
Execution of CUBLAS
Initialization of CUBLAS
Assignment of CUDA Device
Termination of CUBLAS
…
cublasSetVector(n*n,TS,M,1,m,1);
/* similar for V & v */
…
cublasDgemv( handle,
CUBLAS_OP_N,
n, n, &alpha, m, n, v, 1, &beta, a, 1
);
CUBLAS
Declaration of Device Array
Allocation of Device Array
Declaration of Host Array
Copy Array : Host to Device
Copy Array : Device to Host
De-allocation of Device Array
Execution of CUBLAS
Initialization of CUBLAS
Assignment of CUDA Device
Termination of CUBLAS
…
cublasGetVector(n,TS,a,1,A,1);
…
cublasDestroy(handle);
…
cudaFree(m);
/* similar for v & a */
DIRECT PARALLELIZATION
Assign the job to each threads directly
Direct Parallelization
Direct Parallelization
Direct Parallelization
Grid
Block Block
Block Block
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Thread
GridDim.y
GridDim.x
Direct Parallelization
Grid
Block
Thread
Thread
GridDim.y
GridDim.x
(1,1)
BlockIdx.x
BlockIdx.y
Direct Parallelization
Grid
Block
Thread
Thread
GridDim.y
GridDim.x
BlockDim.y
BlockDim.x
ThreadIdx.x
ThreadIdx.y
(0,1)
Direct Parallelization
GPU_ID = BlockDim.x * BlockIdx.x + ThreadId.x
Grid
Block
Thd Thd
Block
Thd Thd
Block
Thd Thd
Block
Thd Thd
Block
Thd Thd
Direct Parallelization
GPU_ID = BlockDim.x * BlockIdx.x + ThreadId.x
Grid
Block
Thd Thd
Block
Thd Thd
Block
Thd Thd
Block
Thd Thd
Block
Thd Thd
GPU_ID = 2 3 1* +
GPU_ID:
0 1 2 3 4 5 6 7 8 9
#include <cuda_runtime.h>
#define IJToIdx(i,j,n) (j*n+i)
…
Double* M;
Double* m;
/* similar for V & v & A & a */
…
cudaSetDevice(0);
Direct Parallelization
Declaration of Device Array
Allocation of Device Array
Declaration of Host Array
Copy Array : Host to Device
Copy Array : Device to Host
De-allocation of Device Array
Execution of Kernel
Determination of Size for Grid & Block
Assignment of CUDA Device
Allocation of Device Array
Copy Array : Host to Device
Determination of Size for Grid & Block
Direct Parallelization
Declaration of Device Array
Allocation of Device Array
Declaration of Host Array
Copy Array : Host to Device
Copy Array : Device to Host
De-allocation of Device Array
Execution of Kernel
Determination of Size for Grid & Block
Assignment of CUDA Device
Allocation of Device Array
Copy Array : Host to Device
Determination of Size for Grid & Block
…
TS=sizeof(double);
size=n*n*typesize;
cudaMalloc( (void**)&m, size );
/* similar for V & v & A & a */
…
cudaMemcpy(m, M, size,
cudaMemcpyHostToDevice);
/* similar for V & v */
...
Direct Parallelization
Declaration of Device Array
Allocation of Device Array
Declaration of Host Array
Copy Array : Host to Device
Copy Array : Device to Host
De-allocation of Device Array
Execution of Kernel
Determination of Size for Grid & Block
Assignment of CUDA Device
Allocation of Device Array
Copy Array : Host to Device
Determination of Size for Grid & Block
1.Memory assessment
2.Memory alignment
3.Data flow
Use as many threads as possible:
a[ i] m[11] … m[1n]
v[1]
…
v[ j]
…
v[n]
= *
Direct Parallelization
Declaration of Device Array
Allocation of Device Array
Declaration of Host Array
Copy Array : Host to Device
Copy Array : Device to Host
De-allocation of Device Array
Execution of Kernel
Determination of Size for Grid & Block
Assignment of CUDA Device
Allocation of Device Array
Copy Array : Host to Device
Determination of Size for Grid & Block
…
My_Dgemv<<<n,1>>>( … );
…
__global__ My_Dgemv( … ){
/* algorithm for MV */
};
Direct Parallelization
Declaration of Device Array
Allocation of Device Array
Declaration of Host Array
Copy Array : Host to Device
Copy Array : Device to Host
De-allocation of Device Array
Execution of Kernel
Determination of Size for Grid & Block
Assignment of CUDA Device
Allocation of Device Array
Copy Array : Host to Device
Determination of Size for Grid & Block
__global__ My_Dgemv( … ){
…
id=BlockIdx.x;
i=id;
a[i]=0;
For(j=0,j<n,j++){
a[i]=a[i]+m[ IJToIdx(i,j,n) ]*a[j];
}
}
Direct Parallelization
Declaration of Device Array
Allocation of Device Array
Declaration of Host Array
Copy Array : Host to Device
Copy Array : Device to Host
De-allocation of Device Array
Execution of Kernel
Determination of Size for Grid & Block
Assignment of CUDA Device
Allocation of Device Array
Copy Array : Host to Device
Determination of Size for Grid & Block
…
cudaMemcpy(M, m, size,
cudaMemcpyDeviceToHost);
/* similar for A & a */
...
cudaFree(m);
/* similar for v & a */
…
Performance
Time(ms)
Dimension of Vector
REPORT
END OF THE
Thank you for your attention
C.M. WANG
Research Assistant
RGB255,102,0
RGB255,255,25
0
RGB91,96,95
RGB161,161,148
Background
RGB70,70,70
https://kuler.adobe.com/Copy-of-Stormy-Orange-color-theme-2828733/
Kuler: copy of stormy orange

Contenu connexe

Tendances

Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)
Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)
Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)Gavin Guo
 
Drizzle to MySQL, Stress Free Migration
Drizzle to MySQL, Stress Free MigrationDrizzle to MySQL, Stress Free Migration
Drizzle to MySQL, Stress Free MigrationAndrew Hutchings
 
Building a DSL with GraalVM (CodeOne)
Building a DSL with GraalVM (CodeOne)Building a DSL with GraalVM (CodeOne)
Building a DSL with GraalVM (CodeOne)Maarten Mulders
 
Rust: Reach Further (from QCon Sao Paolo 2018)
Rust: Reach Further (from QCon Sao Paolo 2018)Rust: Reach Further (from QCon Sao Paolo 2018)
Rust: Reach Further (from QCon Sao Paolo 2018)nikomatsakis
 
Migrating KSM page causes the VM lock up as the KSM page merging list is too ...
Migrating KSM page causes the VM lock up as the KSM page merging list is too ...Migrating KSM page causes the VM lock up as the KSM page merging list is too ...
Migrating KSM page causes the VM lock up as the KSM page merging list is too ...Gavin Guo
 
Intro to Rust from Applicative / NY Meetup
Intro to Rust from Applicative / NY MeetupIntro to Rust from Applicative / NY Meetup
Intro to Rust from Applicative / NY Meetupnikomatsakis
 
Rust "Hot or Not" at Sioux
Rust "Hot or Not" at SiouxRust "Hot or Not" at Sioux
Rust "Hot or Not" at Siouxnikomatsakis
 
Parallel Computing with R
Parallel Computing with RParallel Computing with R
Parallel Computing with RPeter Solymos
 
Building a DSL with GraalVM (VoxxedDays Luxembourg)
Building a DSL with GraalVM (VoxxedDays Luxembourg)Building a DSL with GraalVM (VoxxedDays Luxembourg)
Building a DSL with GraalVM (VoxxedDays Luxembourg)Maarten Mulders
 
From Zero to Application Delivery with NixOS
From Zero to Application Delivery with NixOSFrom Zero to Application Delivery with NixOS
From Zero to Application Delivery with NixOSSusan Potter
 
GLX, DRI, and i965
GLX, DRI, and i965GLX, DRI, and i965
GLX, DRI, and i965Chia-I Wu
 
Guaranteeing Memory Safety in Rust
Guaranteeing Memory Safety in RustGuaranteeing Memory Safety in Rust
Guaranteeing Memory Safety in Rustnikomatsakis
 
Engineering fast indexes
Engineering fast indexesEngineering fast indexes
Engineering fast indexesDaniel Lemire
 
Obtaining the Perfect Smoke By Monitoring Your BBQ with InfluxDB and Telegraf
Obtaining the Perfect Smoke By Monitoring Your BBQ with InfluxDB and TelegrafObtaining the Perfect Smoke By Monitoring Your BBQ with InfluxDB and Telegraf
Obtaining the Perfect Smoke By Monitoring Your BBQ with InfluxDB and TelegrafInfluxData
 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Daniel Lemire
 
How to use KASAN to debug memory corruption in OpenStack environment- (2)
How to use KASAN to debug memory corruption in OpenStack environment- (2)How to use KASAN to debug memory corruption in OpenStack environment- (2)
How to use KASAN to debug memory corruption in OpenStack environment- (2)Gavin Guo
 
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenPostgresOpen
 

Tendances (20)

Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)
Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)
Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)
 
Db2
Db2Db2
Db2
 
Drizzle to MySQL, Stress Free Migration
Drizzle to MySQL, Stress Free MigrationDrizzle to MySQL, Stress Free Migration
Drizzle to MySQL, Stress Free Migration
 
Building a DSL with GraalVM (CodeOne)
Building a DSL with GraalVM (CodeOne)Building a DSL with GraalVM (CodeOne)
Building a DSL with GraalVM (CodeOne)
 
Rust: Reach Further (from QCon Sao Paolo 2018)
Rust: Reach Further (from QCon Sao Paolo 2018)Rust: Reach Further (from QCon Sao Paolo 2018)
Rust: Reach Further (from QCon Sao Paolo 2018)
 
Migrating KSM page causes the VM lock up as the KSM page merging list is too ...
Migrating KSM page causes the VM lock up as the KSM page merging list is too ...Migrating KSM page causes the VM lock up as the KSM page merging list is too ...
Migrating KSM page causes the VM lock up as the KSM page merging list is too ...
 
Intro to Rust from Applicative / NY Meetup
Intro to Rust from Applicative / NY MeetupIntro to Rust from Applicative / NY Meetup
Intro to Rust from Applicative / NY Meetup
 
Rust "Hot or Not" at Sioux
Rust "Hot or Not" at SiouxRust "Hot or Not" at Sioux
Rust "Hot or Not" at Sioux
 
Parallel Computing with R
Parallel Computing with RParallel Computing with R
Parallel Computing with R
 
Building a DSL with GraalVM (VoxxedDays Luxembourg)
Building a DSL with GraalVM (VoxxedDays Luxembourg)Building a DSL with GraalVM (VoxxedDays Luxembourg)
Building a DSL with GraalVM (VoxxedDays Luxembourg)
 
From Zero to Application Delivery with NixOS
From Zero to Application Delivery with NixOSFrom Zero to Application Delivery with NixOS
From Zero to Application Delivery with NixOS
 
GLX, DRI, and i965
GLX, DRI, and i965GLX, DRI, and i965
GLX, DRI, and i965
 
Guaranteeing Memory Safety in Rust
Guaranteeing Memory Safety in RustGuaranteeing Memory Safety in Rust
Guaranteeing Memory Safety in Rust
 
Engineering fast indexes
Engineering fast indexesEngineering fast indexes
Engineering fast indexes
 
Obtaining the Perfect Smoke By Monitoring Your BBQ with InfluxDB and Telegraf
Obtaining the Perfect Smoke By Monitoring Your BBQ with InfluxDB and TelegrafObtaining the Perfect Smoke By Monitoring Your BBQ with InfluxDB and Telegraf
Obtaining the Perfect Smoke By Monitoring Your BBQ with InfluxDB and Telegraf
 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
 
How to use KASAN to debug memory corruption in OpenStack environment- (2)
How to use KASAN to debug memory corruption in OpenStack environment- (2)How to use KASAN to debug memory corruption in OpenStack environment- (2)
How to use KASAN to debug memory corruption in OpenStack environment- (2)
 
Digging OpenStack
Digging OpenStackDigging OpenStack
Digging OpenStack
 
Ordered Record Collection
Ordered Record CollectionOrdered Record Collection
Ordered Record Collection
 
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
 

En vedette

EIMR 2014 Presentation slides
EIMR 2014 Presentation slidesEIMR 2014 Presentation slides
EIMR 2014 Presentation slidesAndrew Guerin
 
2015 IEEE CES download - part 2 - Tenaya Hurst
2015 IEEE CES download - part 2 - Tenaya Hurst2015 IEEE CES download - part 2 - Tenaya Hurst
2015 IEEE CES download - part 2 - Tenaya HurstJoseph Wei
 
World Fisheries Congress 2012 Presentation
World Fisheries Congress 2012 PresentationWorld Fisheries Congress 2012 Presentation
World Fisheries Congress 2012 PresentationAndrew Guerin
 
Slide share evaluation ..
Slide share evaluation ..Slide share evaluation ..
Slide share evaluation ..amybeedell
 
4 ltr powerpoint2010_ch20_pr1a_meredihfrydryk_2
4 ltr powerpoint2010_ch20_pr1a_meredihfrydryk_24 ltr powerpoint2010_ch20_pr1a_meredihfrydryk_2
4 ltr powerpoint2010_ch20_pr1a_meredihfrydryk_2m_frydryk
 
ISEM presentation, 29th October 2013
ISEM presentation, 29th October 2013ISEM presentation, 29th October 2013
ISEM presentation, 29th October 2013Andrew Guerin
 
2013 MASTS eposter - coastal salmon agent-based model
2013 MASTS eposter - coastal salmon agent-based model2013 MASTS eposter - coastal salmon agent-based model
2013 MASTS eposter - coastal salmon agent-based modelAndrew Guerin
 
Slide share evaluation ..
Slide share evaluation ..Slide share evaluation ..
Slide share evaluation ..amybeedell
 

En vedette (10)

EIMR 2014 Presentation slides
EIMR 2014 Presentation slidesEIMR 2014 Presentation slides
EIMR 2014 Presentation slides
 
2015 IEEE CES download - part 2 - Tenaya Hurst
2015 IEEE CES download - part 2 - Tenaya Hurst2015 IEEE CES download - part 2 - Tenaya Hurst
2015 IEEE CES download - part 2 - Tenaya Hurst
 
World Fisheries Congress 2012 Presentation
World Fisheries Congress 2012 PresentationWorld Fisheries Congress 2012 Presentation
World Fisheries Congress 2012 Presentation
 
Slide share evaluation ..
Slide share evaluation ..Slide share evaluation ..
Slide share evaluation ..
 
4 ltr powerpoint2010_ch20_pr1a_meredihfrydryk_2
4 ltr powerpoint2010_ch20_pr1a_meredihfrydryk_24 ltr powerpoint2010_ch20_pr1a_meredihfrydryk_2
4 ltr powerpoint2010_ch20_pr1a_meredihfrydryk_2
 
ISEM presentation, 29th October 2013
ISEM presentation, 29th October 2013ISEM presentation, 29th October 2013
ISEM presentation, 29th October 2013
 
Pie charts
Pie chartsPie charts
Pie charts
 
2013 MASTS eposter - coastal salmon agent-based model
2013 MASTS eposter - coastal salmon agent-based model2013 MASTS eposter - coastal salmon agent-based model
2013 MASTS eposter - coastal salmon agent-based model
 
Brown presentation
Brown presentationBrown presentation
Brown presentation
 
Slide share evaluation ..
Slide share evaluation ..Slide share evaluation ..
Slide share evaluation ..
 

Similaire à 2013 0928 programming by cuda

Tema3_Introduction_to_CUDA_C.pdf
Tema3_Introduction_to_CUDA_C.pdfTema3_Introduction_to_CUDA_C.pdf
Tema3_Introduction_to_CUDA_C.pdfpepe464163
 
NativeBoost
NativeBoostNativeBoost
NativeBoostESUG
 
Introduction to CUDA C: NVIDIA : Notes
Introduction to CUDA C: NVIDIA : NotesIntroduction to CUDA C: NVIDIA : Notes
Introduction to CUDA C: NVIDIA : NotesSubhajit Sahu
 
An Introduction to CUDA-OpenCL - University.pptx
An Introduction to CUDA-OpenCL - University.pptxAn Introduction to CUDA-OpenCL - University.pptx
An Introduction to CUDA-OpenCL - University.pptxAnirudhGarg35
 
Oreilly Webcast 01 19 10
Oreilly Webcast 01 19 10Oreilly Webcast 01 19 10
Oreilly Webcast 01 19 10Sean Hull
 
Automating everything with PowerShell, Terraform, and AWS
Automating everything with PowerShell, Terraform, and AWSAutomating everything with PowerShell, Terraform, and AWS
Automating everything with PowerShell, Terraform, and AWSChris Brown
 
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule
 
2009 cluster user training
2009 cluster user training2009 cluster user training
2009 cluster user trainingChris Dwan
 
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them AllScylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them AllScyllaDB
 
PV-Drivers for SeaBIOS using Upstream Qemu
PV-Drivers for SeaBIOS using Upstream QemuPV-Drivers for SeaBIOS using Upstream Qemu
PV-Drivers for SeaBIOS using Upstream QemuThe Linux Foundation
 
Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...
Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...
Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...NETWAYS
 
Puppet at Opera Sofware - PuppetCamp Oslo 2013
Puppet at Opera Sofware - PuppetCamp Oslo 2013Puppet at Opera Sofware - PuppetCamp Oslo 2013
Puppet at Opera Sofware - PuppetCamp Oslo 2013Cosimo Streppone
 
Build Your Own Android Tablet
Build Your Own Android TabletBuild Your Own Android Tablet
Build Your Own Android TabletSGAndroidDevs
 
S4 xen hypervisor_20080622
S4 xen hypervisor_20080622S4 xen hypervisor_20080622
S4 xen hypervisor_20080622Todd Deshane
 

Similaire à 2013 0928 programming by cuda (20)

Tema3_Introduction_to_CUDA_C.pdf
Tema3_Introduction_to_CUDA_C.pdfTema3_Introduction_to_CUDA_C.pdf
Tema3_Introduction_to_CUDA_C.pdf
 
Linux Device Tree
Linux Device TreeLinux Device Tree
Linux Device Tree
 
NativeBoost
NativeBoostNativeBoost
NativeBoost
 
Introduction to CUDA C: NVIDIA : Notes
Introduction to CUDA C: NVIDIA : NotesIntroduction to CUDA C: NVIDIA : Notes
Introduction to CUDA C: NVIDIA : Notes
 
An Introduction to CUDA-OpenCL - University.pptx
An Introduction to CUDA-OpenCL - University.pptxAn Introduction to CUDA-OpenCL - University.pptx
An Introduction to CUDA-OpenCL - University.pptx
 
Oreilly Webcast 01 19 10
Oreilly Webcast 01 19 10Oreilly Webcast 01 19 10
Oreilly Webcast 01 19 10
 
Automating everything with PowerShell, Terraform, and AWS
Automating everything with PowerShell, Terraform, and AWSAutomating everything with PowerShell, Terraform, and AWS
Automating everything with PowerShell, Terraform, and AWS
 
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance Computing
 
2009 cluster user training
2009 cluster user training2009 cluster user training
2009 cluster user training
 
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them AllScylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
 
Docker.io
Docker.ioDocker.io
Docker.io
 
005 skyeye
005 skyeye005 skyeye
005 skyeye
 
PV-Drivers for SeaBIOS using Upstream Qemu
PV-Drivers for SeaBIOS using Upstream QemuPV-Drivers for SeaBIOS using Upstream Qemu
PV-Drivers for SeaBIOS using Upstream Qemu
 
Book
BookBook
Book
 
Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...
Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...
Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...
 
Lecture 4 Cluster Computing
Lecture 4 Cluster ComputingLecture 4 Cluster Computing
Lecture 4 Cluster Computing
 
Puppet at Opera Sofware - PuppetCamp Oslo 2013
Puppet at Opera Sofware - PuppetCamp Oslo 2013Puppet at Opera Sofware - PuppetCamp Oslo 2013
Puppet at Opera Sofware - PuppetCamp Oslo 2013
 
Build Your Own Android Tablet
Build Your Own Android TabletBuild Your Own Android Tablet
Build Your Own Android Tablet
 
Cuda debugger
Cuda debuggerCuda debugger
Cuda debugger
 
S4 xen hypervisor_20080622
S4 xen hypervisor_20080622S4 xen hypervisor_20080622
S4 xen hypervisor_20080622
 

Dernier

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 

Dernier (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

2013 0928 programming by cuda

  • 1. CUDA PROGRAMMING BY Direct parallelization & CUBLAS C.M. WANG Research Assistant
  • 2. OUTLINE  Preparation  CUBLAS  Direct Parallelization
  • 4. PATHso your compiler knows where to find the libraries SET UP THE export PATH=/usr/local/cuda-5.0/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda-5.0/lib:/usr/local/cuda-5.0/lib64:$LD_LIBRARY_PATH [YourAccount@John ~]$ ls –a [YourAccount@John ~]$ vi .bash_profile 1.Open the bash profile 2.Add these lines to the file
  • 5. MAKEFILE to configure your compilation for the source code CREATE THE MAIN=filename ${MAIN} .e: nvcc ${MAIN} .cu –o ${MAIN} .e –m64 –arch sm_35 –lcublas –O3 Create a makefile something like this:
  • 6. MEMORYso the GPU can actually store the data in computation CHECK YOUR Global memory available on one card: 5GB.
  • 8. CUBLAS Declaration of Device Array Allocation of Device Array Declaration of Host Array Copy Array : Host to Device Copy Array : Device to Host De-allocation of Device Array Execution of CUBLAS Initialization of CUBLAS Assignment of CUDA Device Termination of CUBLAS #include <cuda_runtime.h> #include <cublas_v2.h> … Double* M; Double* m; /* similar for V & v & A & a */ … cudaSetDevice(0);
  • 9. CUBLAS Declaration of Device Array Allocation of Device Array Declaration of Host Array Copy Array : Host to Device Copy Array : Device to Host De-allocation of Device Array Execution of CUBLAS Initialization of CUBLAS Assignment of CUDA Device Termination of CUBLAS … TS=sizeof(double); size=n*n*typesize; cudaMalloc( (void**)&m, size ); /* similar for V & v & A & a */ … cublasStatus_t status; cublasHandle_t handle; status=cublasCreate(&handle); ...
  • 10. CUBLAS Declaration of Device Array Allocation of Device Array Declaration of Host Array Copy Array : Host to Device Copy Array : Device to Host De-allocation of Device Array Execution of CUBLAS Initialization of CUBLAS Assignment of CUDA Device Termination of CUBLAS … cublasSetVector(n*n,TS,M,1,m,1); /* similar for V & v */ … cublasDgemv( handle, CUBLAS_OP_N, n, n, &alpha, m, n, v, 1, &beta, a, 1 );
  • 11. CUBLAS Declaration of Device Array Allocation of Device Array Declaration of Host Array Copy Array : Host to Device Copy Array : Device to Host De-allocation of Device Array Execution of CUBLAS Initialization of CUBLAS Assignment of CUDA Device Termination of CUBLAS … cublasGetVector(n,TS,a,1,A,1); … cublasDestroy(handle); … cudaFree(m); /* similar for v & a */
  • 12. DIRECT PARALLELIZATION Assign the job to each threads directly
  • 15. Direct Parallelization Grid Block Block Block Block Thread Thread Thread Thread Thread Thread Thread Thread GridDim.y GridDim.x
  • 18. Direct Parallelization GPU_ID = BlockDim.x * BlockIdx.x + ThreadId.x Grid Block Thd Thd Block Thd Thd Block Thd Thd Block Thd Thd Block Thd Thd
  • 19. Direct Parallelization GPU_ID = BlockDim.x * BlockIdx.x + ThreadId.x Grid Block Thd Thd Block Thd Thd Block Thd Thd Block Thd Thd Block Thd Thd GPU_ID = 2 3 1* + GPU_ID: 0 1 2 3 4 5 6 7 8 9
  • 20. #include <cuda_runtime.h> #define IJToIdx(i,j,n) (j*n+i) … Double* M; Double* m; /* similar for V & v & A & a */ … cudaSetDevice(0); Direct Parallelization Declaration of Device Array Allocation of Device Array Declaration of Host Array Copy Array : Host to Device Copy Array : Device to Host De-allocation of Device Array Execution of Kernel Determination of Size for Grid & Block Assignment of CUDA Device Allocation of Device Array Copy Array : Host to Device Determination of Size for Grid & Block
  • 21. Direct Parallelization Declaration of Device Array Allocation of Device Array Declaration of Host Array Copy Array : Host to Device Copy Array : Device to Host De-allocation of Device Array Execution of Kernel Determination of Size for Grid & Block Assignment of CUDA Device Allocation of Device Array Copy Array : Host to Device Determination of Size for Grid & Block … TS=sizeof(double); size=n*n*typesize; cudaMalloc( (void**)&m, size ); /* similar for V & v & A & a */ … cudaMemcpy(m, M, size, cudaMemcpyHostToDevice); /* similar for V & v */ ...
  • 22. Direct Parallelization Declaration of Device Array Allocation of Device Array Declaration of Host Array Copy Array : Host to Device Copy Array : Device to Host De-allocation of Device Array Execution of Kernel Determination of Size for Grid & Block Assignment of CUDA Device Allocation of Device Array Copy Array : Host to Device Determination of Size for Grid & Block 1.Memory assessment 2.Memory alignment 3.Data flow Use as many threads as possible: a[ i] m[11] … m[1n] v[1] … v[ j] … v[n] = *
  • 23. Direct Parallelization Declaration of Device Array Allocation of Device Array Declaration of Host Array Copy Array : Host to Device Copy Array : Device to Host De-allocation of Device Array Execution of Kernel Determination of Size for Grid & Block Assignment of CUDA Device Allocation of Device Array Copy Array : Host to Device Determination of Size for Grid & Block … My_Dgemv<<<n,1>>>( … ); … __global__ My_Dgemv( … ){ /* algorithm for MV */ };
  • 24. Direct Parallelization Declaration of Device Array Allocation of Device Array Declaration of Host Array Copy Array : Host to Device Copy Array : Device to Host De-allocation of Device Array Execution of Kernel Determination of Size for Grid & Block Assignment of CUDA Device Allocation of Device Array Copy Array : Host to Device Determination of Size for Grid & Block __global__ My_Dgemv( … ){ … id=BlockIdx.x; i=id; a[i]=0; For(j=0,j<n,j++){ a[i]=a[i]+m[ IJToIdx(i,j,n) ]*a[j]; } }
  • 25. Direct Parallelization Declaration of Device Array Allocation of Device Array Declaration of Host Array Copy Array : Host to Device Copy Array : Device to Host De-allocation of Device Array Execution of Kernel Determination of Size for Grid & Block Assignment of CUDA Device Allocation of Device Array Copy Array : Host to Device Determination of Size for Grid & Block … cudaMemcpy(M, m, size, cudaMemcpyDeviceToHost); /* similar for A & a */ ... cudaFree(m); /* similar for v & a */ …
  • 27. REPORT END OF THE Thank you for your attention C.M. WANG Research Assistant