SlideShare une entreprise Scribd logo
1  sur  18
CUDA Programming
continued
ITCS 4145/5145 Nov 24, 2010 © Barry Wilkinson
Revised
2
Timing GPU Execution
Can use CUDA “events” – create two events and compute
the time between them:
cudaEvent_t start, stop;
float elapsedTime;
cudaEventCreate(&start); // create event objects
cudaEventCreate(&stop);
cudaEventRecord(start, 0); // Record start event
.
.
.
cudaEventRecord(stop, 0); // record end event
cudaEventSynchronize(stop); // wait work preceding to complete
cudaEventRecord(stop,0)
cudaEventElapsedTime(&elapsedTime, start, stop);
//compute elapsed time between events
cudaEventDestroy(start); //destroy start event
cudaEventDestroy(stop);); //destroy stop event
3
4
5
6
7
8
Host Synchronization
Kernels
•
Control returned to CPU immediately (asynchronous,
non-blocking)
•
Kernel starts after all previous CUDA calls completed
cudaMemcpy
•
Returns after copy complete (synchronous)
•
Copy starts after all previous CUDA calls completed
9
CUDA Synchronization Routines
Host
cudaThreadSynchronize()
•
Blocks until all previous CUDA calls complete
GPU
void __syncthreads()
•
Synchronizes all threads in a block
•
Barrier – no thread can pass until all threads in block
reach it.
•
All threads must reach __syncthread in thread block.
10
GPU Atomic Operations
Performs a read-modify-write atomic operation on one
word residing in global or shared memory.
Associative operations on signed/unsigned integers,
add, sub, min, max, and, or, xor, increment,
decrement, exchange, compare and swap.
Requires GPU with compute capability 1.1+
(Shared memory operations and 64-bit words require higher capability)
coit-grid06 Tesla C2050 has compute capability 2.0
See http://www.nvidia.com/object/cuda_gpus.html for GPU compute capabilities
11
int atomicAdd(int* address, int val);
reads old located at address address in global or shared
memory, computes (old + val), and stores result back to
memory at same address.
These three operations (read, compute, and write) are
performed in one atomic transaction.*
Function returns old.
Atomic Operation Example
* Once stated, it continues to completion without being able to be interrupted by other
processors. Other processors cannot read or write to memory location once atomic
operation starts. Mechanism implemented in hardware.
12
Other operations
int atomicSub(int* address, int val);
int atomicExch(int* address, int val);
int atomicMin(int* address, int val);
int atomicMax(int* address, int val);
unsigned int atomicInc(unsigned int* address, unsigned int val);
unsigned int atomicDec(unsigned int* address, unsigned int val);
int atomicCAS(int* address, int compare, int val); //compare and swap
int atomicAnd(int* address, int val);
int atomicOr(int* address, int val);
int atomicXor(int* address, int val);
Source: NVIDIA CUDA C Programming Guide, version 3.2, 11/9/2010
13
int atomicCAS(int* address, int compare, int val);
reads the word old located at address address in global or shared
memory, and compares old with compare. If they are the same, it
set old to val (stores val at address address), i.e.:
if (old == compare) old = val; // else old = old
The three operations (read, compute, and write) are performed in
one atomic transaction.
The function returns the original value of old.
Also unsigned and unsigned long long int versions.
Compare and Swap
(also called compare and exchange)
14
__device__ int lock=0; // unlocked
__global__ void kernel(...) {
...
do {} while(atomicCAS(&lock,0,1)); // if lock = 0 set to1
// and continue
... // critical section
lock = 0; // free lock
}
Coding Critical Sections with
Locks
15
Memory Fences
Threads may see the effects of a series of writes to memory
executed by another thread in different orders. To enforce ordering:
void __threadfence_block();
waits until all global and shared memory accesses made by the
calling thread prior to __threadfence_block() are visible to all
threads in the thread block.
Other routines:
void __threadfence();
void __threadfence_system();
16
Writes to device memory not guaranteed in any order,
so global writes may not have completed by the time the
lock is unlocked
__global__ void kernel(...) {
...
do {} while(atomicCAS(&lock,0,1));
... // criticial section
__threadfence(); // wait for writes to finish
lock = 0;
}
Critical sections with memory
operations
17
Error reporting
All CUDA calls (except kernel launches) return error code of type
cudaError_t
cudaError_t cudaGetLastError(void)
Returns code for the last error
Can be used to get error from kernel execution.
Char* cudaGetErroprString(cudaError_t code)
Returns a null-terminated character string describing error
Example
print(“%sn”,cudaGetErrorString(cudaGetLastError());
Questions

Contenu connexe

Tendances

Systemd evolution revolution_regression
Systemd evolution revolution_regressionSystemd evolution revolution_regression
Systemd evolution revolution_regressionSusant Sahani
 
Systemd mlug-20140614
Systemd mlug-20140614Systemd mlug-20140614
Systemd mlug-20140614Susant Sahani
 
Effective java - concurrency
Effective java - concurrencyEffective java - concurrency
Effective java - concurrencyfeng lee
 
Effective service and resource management with systemd
Effective service and resource management with systemdEffective service and resource management with systemd
Effective service and resource management with systemdDavid Timothy Strauss
 
Actor Concurrency
Actor ConcurrencyActor Concurrency
Actor ConcurrencyAlex Miller
 
Java Concurrency Gotchas
Java Concurrency GotchasJava Concurrency Gotchas
Java Concurrency GotchasAlex Miller
 
FreeRTOS Xilinx Vivado: Hello World!
FreeRTOS Xilinx Vivado: Hello World!FreeRTOS Xilinx Vivado: Hello World!
FreeRTOS Xilinx Vivado: Hello World!Vincent Claes
 
Locks (Concurrency)
Locks (Concurrency)Locks (Concurrency)
Locks (Concurrency)Sri Prasanna
 
Introduction to systemd
Introduction to systemdIntroduction to systemd
Introduction to systemdYusaku OGAWA
 
Locally run a FIWARE Lab Instance In another Hypervisors
Locally run a FIWARE Lab Instance In another HypervisorsLocally run a FIWARE Lab Instance In another Hypervisors
Locally run a FIWARE Lab Instance In another HypervisorsJosé Ignacio Carretero Guarde
 
Scale11x lxc talk
Scale11x lxc talkScale11x lxc talk
Scale11x lxc talkdotCloud
 
Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...
Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...
Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...JAX London
 
Puppet Camp Chicago 2014: Docker and Puppet: 1+1=3 (Intermediate)
Puppet Camp Chicago 2014: Docker and Puppet: 1+1=3 (Intermediate)Puppet Camp Chicago 2014: Docker and Puppet: 1+1=3 (Intermediate)
Puppet Camp Chicago 2014: Docker and Puppet: 1+1=3 (Intermediate)Puppet
 
Memory Management of C# with Unity Native Collections
Memory Management of C# with Unity Native CollectionsMemory Management of C# with Unity Native Collections
Memory Management of C# with Unity Native CollectionsYoshifumi Kawai
 
Let'swift "Concurrency in swift"
Let'swift "Concurrency in swift"Let'swift "Concurrency in swift"
Let'swift "Concurrency in swift"Hyuk Hur
 
Java util concurrent
Java util concurrentJava util concurrent
Java util concurrentRoger Xia
 

Tendances (20)

Systemd evolution revolution_regression
Systemd evolution revolution_regressionSystemd evolution revolution_regression
Systemd evolution revolution_regression
 
Systemd mlug-20140614
Systemd mlug-20140614Systemd mlug-20140614
Systemd mlug-20140614
 
Disruptor
DisruptorDisruptor
Disruptor
 
Effective java - concurrency
Effective java - concurrencyEffective java - concurrency
Effective java - concurrency
 
Effective service and resource management with systemd
Effective service and resource management with systemdEffective service and resource management with systemd
Effective service and resource management with systemd
 
Actor Concurrency
Actor ConcurrencyActor Concurrency
Actor Concurrency
 
Java Concurrency Gotchas
Java Concurrency GotchasJava Concurrency Gotchas
Java Concurrency Gotchas
 
FreeRTOS Xilinx Vivado: Hello World!
FreeRTOS Xilinx Vivado: Hello World!FreeRTOS Xilinx Vivado: Hello World!
FreeRTOS Xilinx Vivado: Hello World!
 
Locks (Concurrency)
Locks (Concurrency)Locks (Concurrency)
Locks (Concurrency)
 
Introduction to systemd
Introduction to systemdIntroduction to systemd
Introduction to systemd
 
Locally run a FIWARE Lab Instance In another Hypervisors
Locally run a FIWARE Lab Instance In another HypervisorsLocally run a FIWARE Lab Instance In another Hypervisors
Locally run a FIWARE Lab Instance In another Hypervisors
 
Scale11x lxc talk
Scale11x lxc talkScale11x lxc talk
Scale11x lxc talk
 
Basic of Systemd
Basic of SystemdBasic of Systemd
Basic of Systemd
 
Node day 2014
Node day 2014Node day 2014
Node day 2014
 
Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...
Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...
Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...
 
Puppet Camp Chicago 2014: Docker and Puppet: 1+1=3 (Intermediate)
Puppet Camp Chicago 2014: Docker and Puppet: 1+1=3 (Intermediate)Puppet Camp Chicago 2014: Docker and Puppet: 1+1=3 (Intermediate)
Puppet Camp Chicago 2014: Docker and Puppet: 1+1=3 (Intermediate)
 
Memory Management of C# with Unity Native Collections
Memory Management of C# with Unity Native CollectionsMemory Management of C# with Unity Native Collections
Memory Management of C# with Unity Native Collections
 
Let'swift "Concurrency in swift"
Let'swift "Concurrency in swift"Let'swift "Concurrency in swift"
Let'swift "Concurrency in swift"
 
systemd
systemdsystemd
systemd
 
Java util concurrent
Java util concurrentJava util concurrent
Java util concurrent
 

Similaire à Cuda 2

OS scheduling and The anatomy of a context switch
OS scheduling and The anatomy of a context switchOS scheduling and The anatomy of a context switch
OS scheduling and The anatomy of a context switchDaniel Ben-Zvi
 
Java Concurrency in Practice
Java Concurrency in PracticeJava Concurrency in Practice
Java Concurrency in PracticeAlina Dolgikh
 
LCA13: Common Clk Framework DVFS Roadmap
LCA13: Common Clk Framework DVFS RoadmapLCA13: Common Clk Framework DVFS Roadmap
LCA13: Common Clk Framework DVFS RoadmapLinaro
 
CUDA Deep Dive
CUDA Deep DiveCUDA Deep Dive
CUDA Deep Divekrasul
 
.NET Multithreading/Multitasking
.NET Multithreading/Multitasking.NET Multithreading/Multitasking
.NET Multithreading/MultitaskingSasha Kravchuk
 
Concurrency Learning From Jdk Source
Concurrency Learning From Jdk SourceConcurrency Learning From Jdk Source
Concurrency Learning From Jdk SourceKaniska Mandal
 
Let's Talk Locks!
Let's Talk Locks!Let's Talk Locks!
Let's Talk Locks!C4Media
 
Android Loaders : Reloaded
Android Loaders : ReloadedAndroid Loaders : Reloaded
Android Loaders : Reloadedcbeyls
 
CUDA by Example : Streams : Notes
CUDA by Example : Streams : NotesCUDA by Example : Streams : Notes
CUDA by Example : Streams : NotesSubhajit Sahu
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debuggingJungMinSEO5
 
Parallel Programming With Dot Net
Parallel Programming With Dot NetParallel Programming With Dot Net
Parallel Programming With Dot NetNeeraj Kaushik
 
Node.js Event Loop & EventEmitter
Node.js Event Loop & EventEmitterNode.js Event Loop & EventEmitter
Node.js Event Loop & EventEmitterSimen Li
 
NodeJSnodesforfreeinmyworldgipsnndnnd.pdf
NodeJSnodesforfreeinmyworldgipsnndnnd.pdfNodeJSnodesforfreeinmyworldgipsnndnnd.pdf
NodeJSnodesforfreeinmyworldgipsnndnnd.pdfVivekSonawane45
 
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...PROIDEA
 
CoreOS, or How I Learned to Stop Worrying and Love Systemd
CoreOS, or How I Learned to Stop Worrying and Love SystemdCoreOS, or How I Learned to Stop Worrying and Love Systemd
CoreOS, or How I Learned to Stop Worrying and Love SystemdRichard Lister
 

Similaire à Cuda 2 (20)

Node js lecture
Node js lectureNode js lecture
Node js lecture
 
OS scheduling and The anatomy of a context switch
OS scheduling and The anatomy of a context switchOS scheduling and The anatomy of a context switch
OS scheduling and The anatomy of a context switch
 
Microkernel Development
Microkernel DevelopmentMicrokernel Development
Microkernel Development
 
Java Concurrency in Practice
Java Concurrency in PracticeJava Concurrency in Practice
Java Concurrency in Practice
 
LCA13: Common Clk Framework DVFS Roadmap
LCA13: Common Clk Framework DVFS RoadmapLCA13: Common Clk Framework DVFS Roadmap
LCA13: Common Clk Framework DVFS Roadmap
 
Sysprog 14
Sysprog 14Sysprog 14
Sysprog 14
 
CUDA Deep Dive
CUDA Deep DiveCUDA Deep Dive
CUDA Deep Dive
 
.NET Multithreading/Multitasking
.NET Multithreading/Multitasking.NET Multithreading/Multitasking
.NET Multithreading/Multitasking
 
Concurrency Learning From Jdk Source
Concurrency Learning From Jdk SourceConcurrency Learning From Jdk Source
Concurrency Learning From Jdk Source
 
Let's Talk Locks!
Let's Talk Locks!Let's Talk Locks!
Let's Talk Locks!
 
Android Loaders : Reloaded
Android Loaders : ReloadedAndroid Loaders : Reloaded
Android Loaders : Reloaded
 
Operating System Assignment Help
Operating System Assignment HelpOperating System Assignment Help
Operating System Assignment Help
 
CUDA by Example : Streams : Notes
CUDA by Example : Streams : NotesCUDA by Example : Streams : Notes
CUDA by Example : Streams : Notes
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
 
Parallel Programming With Dot Net
Parallel Programming With Dot NetParallel Programming With Dot Net
Parallel Programming With Dot Net
 
Java Concurrency
Java ConcurrencyJava Concurrency
Java Concurrency
 
Node.js Event Loop & EventEmitter
Node.js Event Loop & EventEmitterNode.js Event Loop & EventEmitter
Node.js Event Loop & EventEmitter
 
NodeJSnodesforfreeinmyworldgipsnndnnd.pdf
NodeJSnodesforfreeinmyworldgipsnndnnd.pdfNodeJSnodesforfreeinmyworldgipsnndnnd.pdf
NodeJSnodesforfreeinmyworldgipsnndnnd.pdf
 
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
 
CoreOS, or How I Learned to Stop Worrying and Love Systemd
CoreOS, or How I Learned to Stop Worrying and Love SystemdCoreOS, or How I Learned to Stop Worrying and Love Systemd
CoreOS, or How I Learned to Stop Worrying and Love Systemd
 

Plus de Anshul Sharma (12)

Understanding concurrency
Understanding concurrencyUnderstanding concurrency
Understanding concurrency
 
Interm codegen
Interm codegenInterm codegen
Interm codegen
 
Programming using Open Mp
Programming using Open MpProgramming using Open Mp
Programming using Open Mp
 
Open MPI 2
Open MPI 2Open MPI 2
Open MPI 2
 
Open MPI
Open MPIOpen MPI
Open MPI
 
Paralle programming 2
Paralle programming 2Paralle programming 2
Paralle programming 2
 
Parallel programming
Parallel programmingParallel programming
Parallel programming
 
Cuda 3
Cuda 3Cuda 3
Cuda 3
 
Cuda intro
Cuda introCuda intro
Cuda intro
 
Des
DesDes
Des
 
Intoduction to Linux
Intoduction to LinuxIntoduction to Linux
Intoduction to Linux
 
GCC
GCCGCC
GCC
 

Dernier

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 

Dernier (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 

Cuda 2

  • 1. CUDA Programming continued ITCS 4145/5145 Nov 24, 2010 © Barry Wilkinson Revised
  • 2. 2 Timing GPU Execution Can use CUDA “events” – create two events and compute the time between them: cudaEvent_t start, stop; float elapsedTime; cudaEventCreate(&start); // create event objects cudaEventCreate(&stop); cudaEventRecord(start, 0); // Record start event . . . cudaEventRecord(stop, 0); // record end event cudaEventSynchronize(stop); // wait work preceding to complete cudaEventRecord(stop,0) cudaEventElapsedTime(&elapsedTime, start, stop); //compute elapsed time between events cudaEventDestroy(start); //destroy start event cudaEventDestroy(stop);); //destroy stop event
  • 3. 3
  • 4. 4
  • 5. 5
  • 6. 6
  • 7. 7
  • 8. 8 Host Synchronization Kernels • Control returned to CPU immediately (asynchronous, non-blocking) • Kernel starts after all previous CUDA calls completed cudaMemcpy • Returns after copy complete (synchronous) • Copy starts after all previous CUDA calls completed
  • 9. 9 CUDA Synchronization Routines Host cudaThreadSynchronize() • Blocks until all previous CUDA calls complete GPU void __syncthreads() • Synchronizes all threads in a block • Barrier – no thread can pass until all threads in block reach it. • All threads must reach __syncthread in thread block.
  • 10. 10 GPU Atomic Operations Performs a read-modify-write atomic operation on one word residing in global or shared memory. Associative operations on signed/unsigned integers, add, sub, min, max, and, or, xor, increment, decrement, exchange, compare and swap. Requires GPU with compute capability 1.1+ (Shared memory operations and 64-bit words require higher capability) coit-grid06 Tesla C2050 has compute capability 2.0 See http://www.nvidia.com/object/cuda_gpus.html for GPU compute capabilities
  • 11. 11 int atomicAdd(int* address, int val); reads old located at address address in global or shared memory, computes (old + val), and stores result back to memory at same address. These three operations (read, compute, and write) are performed in one atomic transaction.* Function returns old. Atomic Operation Example * Once stated, it continues to completion without being able to be interrupted by other processors. Other processors cannot read or write to memory location once atomic operation starts. Mechanism implemented in hardware.
  • 12. 12 Other operations int atomicSub(int* address, int val); int atomicExch(int* address, int val); int atomicMin(int* address, int val); int atomicMax(int* address, int val); unsigned int atomicInc(unsigned int* address, unsigned int val); unsigned int atomicDec(unsigned int* address, unsigned int val); int atomicCAS(int* address, int compare, int val); //compare and swap int atomicAnd(int* address, int val); int atomicOr(int* address, int val); int atomicXor(int* address, int val); Source: NVIDIA CUDA C Programming Guide, version 3.2, 11/9/2010
  • 13. 13 int atomicCAS(int* address, int compare, int val); reads the word old located at address address in global or shared memory, and compares old with compare. If they are the same, it set old to val (stores val at address address), i.e.: if (old == compare) old = val; // else old = old The three operations (read, compute, and write) are performed in one atomic transaction. The function returns the original value of old. Also unsigned and unsigned long long int versions. Compare and Swap (also called compare and exchange)
  • 14. 14 __device__ int lock=0; // unlocked __global__ void kernel(...) { ... do {} while(atomicCAS(&lock,0,1)); // if lock = 0 set to1 // and continue ... // critical section lock = 0; // free lock } Coding Critical Sections with Locks
  • 15. 15 Memory Fences Threads may see the effects of a series of writes to memory executed by another thread in different orders. To enforce ordering: void __threadfence_block(); waits until all global and shared memory accesses made by the calling thread prior to __threadfence_block() are visible to all threads in the thread block. Other routines: void __threadfence(); void __threadfence_system();
  • 16. 16 Writes to device memory not guaranteed in any order, so global writes may not have completed by the time the lock is unlocked __global__ void kernel(...) { ... do {} while(atomicCAS(&lock,0,1)); ... // criticial section __threadfence(); // wait for writes to finish lock = 0; } Critical sections with memory operations
  • 17. 17 Error reporting All CUDA calls (except kernel launches) return error code of type cudaError_t cudaError_t cudaGetLastError(void) Returns code for the last error Can be used to get error from kernel execution. Char* cudaGetErroprString(cudaError_t code) Returns a null-terminated character string describing error Example print(“%sn”,cudaGetErrorString(cudaGetLastError());