Presentation HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding at the AMD Developer Summit (APU13) November 11-13, 2013
Generative Artificial Intelligence: How generative AI works.pdf
HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding
1. HSAemu ‐ A F ll S t
HSA
A Full System Emulator
E l t
for HSA Platform
for HSA Platform
Prof. Yeh‐Ching Chung
System Software Laboratory
Department of Computer science
Department of Computer science
National Tsing Hua University
National Tsing Hua University ® copyright OIA
National Tsing Hua University
1
6. Specification of Simple HSA Platform
Hardware
– Memory
Memory
• Shared Virtual Memory (hUMA)
• Cache Coherency Domains
• Memory‐Based Signaling and
Synchronization for CPU and GPU
– Task Control
• Architected Queuing Language (AQL)
• Efficient Syscall Infrastructure
• Preemptive Context Switching
– Debugging Infrastructure
gg g
• Allow system software to set
Instruction/ Memory/ Conditional, etc.,
breakpoints
Software
– HSA R ti
HSA Runtime APIs
API
•
•
•
•
•
•
Initialization of HSA components
Topology discovery
Manage AQL packets
Manage AQL packets
Dispatch application tasks
Signal HW and wait for result
Recycle available resources
– User Mode Queue
• Store AQL packets
– Virtual ISA ‐ HSAIL
Virtual ISA
• A low level instruction set designed for
parallel computing
– E
Exception Handling
ti H dli
• GPU trap handler to trigger GPU
interrupt for GPU exception
National Tsing Hua University ® copyright OIA
National Tsing Hua University
6
8. Architecture of HSAemu
HSAemu consists of 6 components
– HSA Runtime
– CPU Simulation Module
– GPU Task Dispatcher
– Functional‐Accurate GPU Simulator (Fast‐
GPU Simulator)
– Cycle‐Accurate GPU Simulator (Mult2sim)
– GPU Helper Functions
National Tsing Hua University ® copyright OIA
National Tsing Hua University
8
9. HSAemu Runtime
User Mode Queue
– Store AQL packets
Store AQL packets
AQL Queue Manager
– Manage AQL packets in User Mode
Queue
AQL Command Dispatcher
– Launch the execution of kernel jobs on
Launch the execution of kernel jobs on
HSAemu
National Tsing Hua University ® copyright OIA
National Tsing Hua University
Support OpenCL runtime
pp
p
9
11. CPU Simulation Module (2)
PQEMU
– A parallel system emulator based on QEMU
A parallel system emulator based on QEMU
– Tow efficient synchronization models (UCC/SCC)
– Dynamic binary translation (DBT) technique
– A project sponsored by MTK
Agent code, HSA runtime, and operating
system are run on PQEMU
system are run on PQEMU
Code Cache
DBT
DBT
DBT
DBT
CPU
CPU
CPU
CPU
Unified Code Cache (UCC) Model
“PQEMU: A Parallel System Emulator Based on QEMU” (ICPADS 2011)
National Tsing Hua University ® copyright OIA
National Tsing Hua University
11
34. M2S‐GPU Simulator (3)
M2S Bridge : An interface to launch
M2S GPU Module
M2S GPU M d l
– Initialize the data structures used by
AMD Southern Islands GPU, including a
AMD Southern Islands GPU, including a
memory register for AMD Southern
Islands GPU to access the shared system
memory in HSAemu
memory in HSAemu
– Invoke M2S GPU Module (the AMD
Southern Islands GPU module in
Multi2Sim)
National Tsing Hua University ® copyright OIA
National Tsing Hua University
34
36. GPU Helper Functions (1)
Memory Helper Function
– A soft‐mmu of GPU with a page table
worker and a TLB to enable hUMA model
– Support the redirect access of a local
pp
segment memory to a non‐shared private
memory in GPU
Kernel Information Helper Function
K
lI f
ti H l
F ti
– Collect and return information of GPU
s u at o a d cu e t e ecut o state
simulation and current execution state
– Retrieve kernel information such as
working item ID, work group size, etc, from
AQL packet
AQL packet
National Tsing Hua University ® copyright OIA
National Tsing Hua University
36
39. Recall: Hardware Simulation of HSAemu
HSA hardware components simulated
– Multicore CPU: A parallel multicore CPU model simulation
– Functional‐Accrate GPU: A generic GPU model simulation
– Cycle‐Accurate GPU: AMD Southern Islands GPU model
simulation
– hUMA: A unified address space between CPU and GPU
simulation
– Synchronization Primitive: Barrier instruction simulation
– Hardware AQL Queue: A HW dispatch queue for GPU
simulation
i l ti
National Tsing Hua University ® copyright OIA
National Tsing Hua University
39
40. Recall: Software Utilities of HSAemu
HSA software utilities designed
– HAS Runtime: HSA runtime library (OpenCL runtime)
– Topology Discovery: Discover the current platform topology
– User Mode Queue: A queue for each user application
– Signal Event: Notify GPU to work
– HSAIL Generator: A PTX to HSAIL source level translator
– BRIG Generator: Generate a binary format from a Kernel file
– HSAIL Translator: Translate HSAIL to host executable binary
– GPU Code Cache: store translated host binaries
National Tsing Hua University ® copyright OIA
National Tsing Hua University
40
47. Future work
Enhance HSAemu by implementing more HSA
features
f t
Integrate HSAemu with some existing cycle‐accurate
I
HSA
ih
i i
l
GPU simulators
Design a cycle‐accurate simulator based on PQEMU
for generic CPU model
Deisgn a cycle‐accurate simulator based on PQEMU
for big.LITTLE CPU model
National Tsing Hua University ® copyright OIA
National Tsing Hua University
47