SlideShare une entreprise Scribd logo
1  sur  59
HSA QUEUING MODEL
HAKAN PERSSON, SENIOR PRINCIPAL ENGINEER,
ARM
HSA QUEUEING, MOTIVATION
MOTIVATION (TODAY’S PICTURE)
© Copyright 2014 HSA Foundation. All Rights Reserved
Application OS GPU
Transfer
buffer to GPU Copy/Map
Memory
Queue Job
Schedule Job
Start Job
Finish Job
Schedule
Application
Get Buffer
Copy/Map
Memory
HSA QUEUEING: REQUIREMENTS
REQUIREMENTS
 Three key technologies are used to build the user mode queueing
mechanism
 Shared Virtual Memory
 System Coherency
 Signaling
 AQL (Architected Queueing Language) enables any agent
enqueue tasks
© Copyright 2014 HSA Foundation. All Rights Reserved
SHARED VIRTUAL MEMORY
PHYSICAL MEMORY
SHARED VIRTUAL MEMORY (TODAY)
 Multiple Virtual memory address spaces
© Copyright 2014 HSA Foundation. All Rights Reserved
CPU0 GPU
VIRTUAL MEMORY1
PHYSICAL MEMORY
VA1->PA1 VA2->PA1
VIRTUAL MEMORY2
PHYSICAL MEMORY
SHARED VIRTUAL MEMORY (HSA)
 Common Virtual Memory for all HSA agents
© Copyright 2014 HSA Foundation. All Rights Reserved
CPU0 GPU
VIRTUAL MEMORY
PHYSICAL MEMORY
VA->PA VA->PA
SHARED VIRTUAL MEMORY
 Advantages
 No mapping tricks, no copying back-and-forth between different PA
addresses
 Send pointers (not data) back and forth between HSA agents.
 Implications
 Common Page Tables (and common interpretation of architectural
semantics such as shareability, protection, etc).
 Common mechanisms for address translation (and servicing address
translation faults)
 Concept of a process address space (PASID) to allow multiple, per
process virtual address spaces within the system.
© Copyright 2014 HSA Foundation. All Rights Reserved
SHARED VIRTUAL MEMORY
 Specifics
 Minimum supported VA width is 48b for 64b systems, and 32b for
32b systems.
 HSA agents may reserve VA ranges for internal use via system
software.
 All HSA agents other than the host unit must use the lowest privilege
level
 If present, read/write access flags for page tables must be
maintained by all agents.
 Read/write permissions apply to all HSA agents, equally.
© Copyright 2014 HSA Foundation. All Rights Reserved
GETTING THERE …
© Copyright 2014 HSA Foundation. All Rights Reserved
Application OS GPU
Transfer
buffer to GPU Copy/Map
Memory
Queue Job
Schedule Job
Start Job
Finish Job
Schedule
Application
Get Buffer
Copy/Map
Memory
CACHE COHERENCY
CACHE COHERENCY DOMAINS (1/3)
 Data accesses to global memory segment from all HSA Agents shall be
coherent without the need for explicit cache maintenance.
© Copyright 2014 HSA Foundation. All Rights Reserved
CACHE COHERENCY DOMAINS (2/3)
 Advantages
 Composability
 Reduced SW complexity when communicating between agents
 Lower barrier to entry when porting software
 Implications
 Hardware coherency support between all HSA agents
 Can take many forms
 Stand alone Snoop Filters / Directories
 Combined L3/Filters
 Snoop-based systems (no filter)
 Etc …
© Copyright 2014 HSA Foundation. All Rights Reserved
CACHE COHERENCY DOMAINS (3/3)
 Specifics
 No requirement for instruction memory accesses to be
coherent
 Only applies to the Primary memory type.
 No requirement for HSA agents to maintain coherency to any
memory location where the HSA agents do not specify the
same memory attributes
 Read-only image data is required to remain static during the
execution of an HSA kernel.
 No double mapping (via different attributes) in order to
modify. Must remain static
© Copyright 2014 HSA Foundation. All Rights Reserved
GETTING CLOSER …
© Copyright 2014 HSA Foundation. All Rights Reserved
Application OS GPU
Transfer
buffer to GPU Copy/Map
Memory
Queue Job
Schedule Job
Start Job
Finish Job
Schedule
Application
Get Buffer
Copy/Map
Memory
SIGNALING
SIGNALING (1/3)
 HSA agents support the ability to use signaling objects
 All creation/destruction signaling objects occurs via HSA
runtime APIs
 From an HSA Agent you can directly access signaling objects.
 Signaling a signal object (this will wake up HSA agents
waiting upon the object)
 Query current object
 Wait on the current object (various conditions supported).
© Copyright 2014 HSA Foundation. All Rights Reserved
SIGNALING (2/3)
 Advantages
 Enables asynchronous events between HSA agents,
without involving the kernel
 Common idiom for work offload
 Low power waiting
 Implications
 Runtime support required
 Commonly implemented on top of cache coherency flows
© Copyright 2014 HSA Foundation. All Rights Reserved
SIGNALING (3/3)
 Specifics
 Only supported within a PASID
 Supported wait conditions are =, !=, < and >=
 Wait operations may return sporadically (no guarantee against
false positives)
 Programmer must test.
 Wait operations have a maximum duration before returning.
 The HSAIL atomic operations are supported on signal objects.
 Signal objects are opaque
 Must use dedicated HSAIL/HSA runtime operations
© Copyright 2014 HSA Foundation. All Rights Reserved
ALMOST THERE…
© Copyright 2014 HSA Foundation. All Rights Reserved
Application OS GPU
Transfer
buffer to GPU Copy/Map
Memory
Queue Job
Schedule Job
Start Job
Finish Job
Schedule
Application
Get Buffer
Copy/Map
Memory
USER MODE QUEUING
ONE BLOCK LEFT
© Copyright 2014 HSA Foundation. All Rights Reserved
Application OS GPU
Transfer
buffer to GPU Copy/Map
Memory
Queue Job
Schedule Job
Start Job
Finish Job
Schedule
Application
Get Buffer
Copy/Map
Memory
USER MODE QUEUEING (1/3)
 User mode Queueing
 Enables user space applications to directly, without OS
intervention, enqueue jobs (“Dispatch Packets”) for HSA
agents.
 Queues are created/destroyed via calls to the HSA
runtime.
 One (or many) agents enqueue packets, a single agent
dequeues packets.
 Requires coherency and shared virtual memory.
© Copyright 2014 HSA Foundation. All Rights Reserved
USER MODE QUEUEING (2/3)
 Advantages
 Avoid involving the kernel/driver when dispatching work for an Agent.
 Lower latency job dispatch enables finer granularity of offload
 Standard memory protection mechanisms may be used to protect communication with
the consuming agent.
 Implications
 Packet formats/fields are Architected – standard across vendors!
 Guaranteed backward compatibility
 Packets are enqueued/dequeued via an Architected protocol (all via memory
accesses and signaling)
 More on this later……
© Copyright 2014 HSA Foundation. All Rights Reserved
SUCCESS!
© Copyright 2014 HSA Foundation. All Rights Reserved
Application OS GPU
Transfer
buffer to GPU Copy/Map
Memory
Queue Job
Schedule Job
Start Job
Finish Job
Schedule
Application
Get Buffer
Copy/Map
Memory
SUCCESS!
© Copyright 2014 HSA Foundation. All Rights Reserved
Application OS GPU
Queue Job
Start Job
Finish Job
ARCHITECTED QUEUEING
LANGUAGE, QUEUES
ARCHITECTED QUEUEING LANGUAGE
 HSA Queues look just like standard shared
memory queues, supporting multi-producer,
single-consumer
 Single producer variant defined with some
optimizations possible.
 Queues consist of storage, read/write indices, ID,
etc.
 Queues are created/destroyed via calls to the
HSA runtime
 “Packets” are placed in queues directly from user
mode, via an architected protocol
 Packet format is architected
© Copyright 2014 HSA Foundation. All Rights Reserved
Producer Producer
Consumer
Read Index
Write Index
Storage in
coherent, shared
memory
Packets
ARCHITECTED QUEUING LANGUAGE
 Packets are read and dispatched for execution from the queue in order, but
may complete in any order.
 There is no guarantee that more than one packet will be processed in parallel at a
time
 There may be many queues. A single agent may also consume from several
queues.
 Any HSA agent may enqueue packets
 CPUs
 GPUs
 Other accelerators
© Copyright 2014 HSA Foundation. All Rights Reserved
QUEUE STRUCTURE
© Copyright 2014 HSA Foundation. All Rights Reserved
Offset (bytes) Size (bytes) Field Notes
0 4 queueType Differentiate different queues
4 4 queueFeatures Indicate supported features
8 8 baseAddress Pointer to packet array
16 16 doorbellSignal HSA signaling object handle
24 4 size Packet array cardinality
28 4 queueId Unique per process
32 8 serviceQueue Queue for callback services
intrinsic 8 writeIndex Packet array write index
intrinsic 8 readIndex Packet array read index
QUEUE VARIANTS
 queueType and queueFeatures together define queue semantics and
capabilities
 Two queueType values defined, other values reserved:
 MULTI – queue supports multiple producers
 SINGLE – queue supports single producer
 queueFeatures is a bitfield indicating capabilities
 DISPATCH (bit 0) if set then queue supports DISPATCH packets
 AGENT_DISPATCH (bit 1) if set then queue supports AGENT_DISPATCH packets
 All other bits are reserved and must be 0
© Copyright 2014 HSA Foundation. All Rights Reserved
QUEUE STRUCTURE DETAILS
 Queue doorbells are HSA signaling objects with restrictions
 Created as part of the queue – lifetime tied to queue object
 Atomic read-modify-write not allowed
 size field value must be aligned to a power of 2
 serviceQueue can be used by HSA kernel for callback services
 Provided by application when queue is created
 Can be mapped to HSA runtime provided serviceQueue, an application serviced
queue, or NULL if no serviceQueue required
© Copyright 2014 HSA Foundation. All Rights Reserved
READ/WRITE INDICES
 readIndex and writeIndex properties are part of the queue, but not visible in the queue structure
 Accessed through HSA runtime API and HSAIL operations
 HSA runtime/HSAIL operations defined to
 Read readIndex or writeIndex property
 Write readIndex or writeIndex property
 Add constant to writeIndex property (returns previous writeIndex value)
 CAS on writeIndex property
 readIndex & writeIndex operations treated as atomic in memory model
 relaxed, acquire, release and acquire-release variants defined as applicable
 readIndex and writeIndex never wrap
 PacketID – the index of a particular packet
 Uniquely identifies each packet of a queue
© Copyright 2014 HSA Foundation. All Rights Reserved
PACKET ENQUEUE
 Packet enqueue follows a few simple steps:
 Reserve space
 Multiple packets can be reserved at a time
 Write packet to queue
 Mark packet as valid
 Producer no longer allowed to modify packet
 Consumer is allowed to start processing packet
 Notify consumer of packet through the queue doorbell
 Multiple packets can be notified at a time
 Doorbell signal should be signaled with last packetID notified
 On small machine model the lower 32 bits of the packetID are used
© Copyright 2014 HSA Foundation. All Rights Reserved
PACKET RESERVATION
 Two flows envisaged
 Atomic add writeIndex with number of packets to reserve
 Producer must wait until packetID < readIndex + size before writing to packet
 Queue can be sized so that wait is unlikely (or impossible)
 Suitable when many threads use one queue
 Check queue not full first, then use atomic CAS to update writeIndex
 Can be inefficient if many threads use the same queue
 Allows different failure model if queue is congested
© Copyright 2014 HSA Foundation. All Rights Reserved
QUEUE OPTIMIZATIONS
 Queue behavior is loosely defined to allow optimizations
 Some potential producer behavior optimizations:
 Keep local copy of readIndex, update when required
 For single producer queues:
 Keep local copy of writeIndex
 Use store operation rather than add/cas atomic to update writeIndex
 Some potential consumer behavior optimizations:
 Use packet format field to determine whether a packet has been submitted rather than writeIndex
property
 Speculatively read multiple packets from the queue
 Not update readIndex for each packet processed
 Rely on value used for doorbellSignal to notify new packets
 Especially useful for single producer queues
© Copyright 2014 HSA Foundation. All Rights Reserved
POTENTIAL MULTI-PRODUCER ALGORITHM
// Allocate packet
uint64_t packetID = hsa_queue_add_write_index_relaxed(q, 1);
// Wait until the queue is no longer full.
uint64_t rdIdx;
do {
rdIdx = hsa_queue_load_read_index_relaxed(q);
} while (packetID >= (rdIdx + q->size));
// calculate index
uint32_t arrayIdx = packetID & (q->size-1);
// copy over the packet, the format field is INVALID
q->baseAddress[arrayIdx] = pkt;
// Update format field with release semantics
q->baseAddress[index].hdr.format.store(DISPATCH, std::memory_order_release);
// ring doorbell, with release semantics (could also amortize over multiple packets)
hsa_signal_send_relaxed(q->doorbellSignal, packetID);
© Copyright 2014 HSA Foundation. All Rights Reserved
POTENTIAL CONSUMER ALGORITHM
// Get location of next packet
uint64_t readIndex = hsa_queue_load_read_index_relaxed(q);
// calculate the index
uint32_t arrayIdx = readIndex & (q->size-1);
// spin while empty (could also perform low-power wait on doorbell)
while (INVALID == q->baseAddress[arrayIdx].hdr.format) { }
// copy over the packet
pkt = q->baseAddress[arrayIdx];
// set the format field to invalid
q->baseAddress[arrayIdx].hdr.format.store(INVALID, std::memory_order_relaxed);
// Update the readIndex using HSA intrinsic
hsa_queue_store_read_index_relaxed(q, readIndex+1);
// Now process <pkt>!
© Copyright 2014 HSA Foundation. All Rights Reserved
ARCHITECTED QUEUEING
LANGUAGE, PACKETS
PACKETS
© Copyright 2014 HSA Foundation. All Rights Reserved
 Packets come in three main types with architected layouts
 Always reserved & Invalid
 Do not contain any valid tasks and are not processed (queue will not progress)
 Dispatch
 Specifies kernel execution over a grid
 Agent Dispatch
 Specifies a single function to perform with a set of parameters
 Barrier
 Used for task dependencies
COMMON PACKET HEADER
Start Offset
(Bytes)
Format Field Name Description
0 uint16_t
format:8
Contains the packet type (Always reserved, Invalid,
Dispatch, Agent Dispatch, and Barrier). Other values are
reserved and should not be used.
barrier:1
If set then processing of packet will only begin when all
preceding packets are complete.
acquireFenceScope:2
Determines the scope and type of the memory fence
operation applied before the packet enters the active
phase.
Must be 0 for Barrier Packets.
releaseFenceScope:2
Determines the scope and type of the memory fence
operation applied after kernel completion but before the
packet is completed.
reserved:3 Must be 0
© Copyright 2014 HSA Foundation. All Rights Reserved
DISPATCH PACKET
© Copyright 2014 HSA Foundation. All Rights Reserved
Start
Offset
(Bytes)
Format Field Name Description
0 uint16_t header Packet header
2 uint16_t
dimensions:2 Number of dimensions specified in gridSize. Valid values are 1, 2, or 3.
reserved:14 Must be 0.
4 uint16_t workgroupSize.x x dimension of work-group (measured in work-items).
6 uint16_t workgroupSize.y y dimension of work-group (measured in work-items).
8 uint16_t workgroupSize.z z dimension of work-group (measured in work-items).
10 uint16_t reserved2 Must be 0.
12 uint32_t gridSize.x x dimension of grid (measured in work-items).
16 uint32_t gridSize.y y dimension of grid (measured in work-items).
20 uint32_t gridSize.z z dimension of grid (measured in work-items).
24 uint32_t privateSegmentSizeBytes Total size in bytes of private memory allocation request (per work-item).
28 uint32_t groupSegmentSizeBytes Total size in bytes of group memory allocation request (per work-group).
32 uint64_t kernelObjectAddress
Address of an object in memory that includes an implementation-defined
executable ISA image for the kernel.
40 uint64_t kernargAddress Address of memory containing kernel arguments.
48 uint64_t reserved3 Must be 0.
56 uint64_t completionSignal Address of HSA signaling object used to indicate completion of the job.
AGENT DISPATCH PACKET
© Copyright 2014 HSA Foundation. All Rights Reserved
Start Offset
(Bytes)
Format Field Name Description
0 uint16_t header Packet header
2 uint16_t type
The function to be performed by the destination Agent. The type value is
split into the following ranges:
 0x0000:0x3FFF – Vendor specific
 0x4000:0x7FFF – HSA runtime
 0x8000:0xFFFF – User registered function
4 uint32_t reserved2 Must be 0.
8 uint64_t returnLocation Pointer to location to store the function return value in.
16 uint64_t arg[0]
64-bit direct or indirect arguments.
24 uint64_t arg[1]
32 uint64_t arg[2]
40 uint64_t arg[3]
48 uint64_t reserved3 Must be 0.
56 uint64_t completionSignal Address of HSA signaling object used to indicate completion of the job.
BARRIER PACKET
 Used for specifying dependences between packets
 HSA agent will not launch any further packets from this queue until the barrier
packet signal conditions are met
 Used for specifying dependences on packets dispatched from any queue.
 Execution phase completes only when all of the dependent signals (up to five) have
been signaled (with the value of 0).
 Or if an error has occurred in one of the packets upon which we have a dependence.
© Copyright 2014 HSA Foundation. All Rights Reserved
BARRIER PACKET
© Copyright 2014 HSA Foundation. All Rights Reserved
Start Offset
(Bytes)
Format Field Name Description
0 uint16_t header Packet header, see 2.8.1 Packet header (p. 16).
2 uint16_t reserved2 Must be 0.
4 uint32_t reserved3 Must be 0.
8 uint64_t depSignal0
Address of dependent signaling objects to be evaluated by the packet processor.
16 uint64_t depSignal1
24 uint64_t depSignal2
32 uint64_t depSignal3
40 uint64_t depSignal4
48 uint64_t reserved4 Must be 0.
56 uint64_t completionSignal Address of HSA signaling object used to indicate completion of the job.
DEPENDENCES
 A user may never assume more than one packet is being executed by an HSA
agent at a time.
 Implications:
 Packets can’t poll on shared memory values which will be set by packets issued from
other queues, unless the user has ensured the proper ordering.
 To ensure all previous packets from a queue have been completed, use the Barrier
bit.
 To ensure specific packets from any queue have completed, use the Barrier packet.
© Copyright 2014 HSA Foundation. All Rights Reserved
HSA QUEUEING, PACKET EXECUTION
PACKET EXECUTION
 Launch phase
 Initiated when launch conditions are met
 All preceding packets in the queue must have exited launch phase
 If the barrier bit in the packet header is set, then all preceding packets in the queue
must have exited completion phase
 Includes memory acquire fence
 Active phase
 Execute the packet
 Barrier packets remain in Active phase until conditions are met.
 Completion phase
 First step is memory release fence – make results visible.
 completionSignal field is then signaled with a decrementing atomic.
© Copyright 2014 HSA Foundation. All Rights Reserved
PACKET EXECUTION – BARRIER BIT
© Copyright 2014 HSA Foundation. All Rights Reserved
Pkt1
Launch
Pkt2
Launch
Pkt1
Execute
Pkt2
Execute
Pkt1
Complete
Pkt3
Launch (barrier=1)
Pkt2
Complete
Pkt3
Execute
Time
Pkt3 launches whenall
packets in the queue
have completed.
PUTTING IT ALL TOGETHER (FFT)
© Copyright 2014 HSA Foundation. All Rights Reserved
Packet 1
Packet 2
Packet 3
Packet 4
Packet 5
Packet 6
Barrier Barrier
X[0]
X[1]
X[2]
X[3]
X[4]
X[5]
X[6]
X[7]
Time
PUTTING IT ALL TOGETHER
© Copyright 2014 HSA Foundation. All Rights Reserved
AQL Pseudo Code
// Send the packets to do the first stage.
aql_dispatch(pkt1);
aql_dispatch(pkt2);
// Send the next two packets, setting the barrier bit so we
// know packets 1 & 2 will be complete before 3 and 4 are
// launched.
aql_dispatch_with _barrier_bit(pkt3);
aql_dispatch(pkt4);
// Same as above (make sure 3 & 4 are done before issuing 5
// & 6)
aql_dispatch_with_barrier_bit(pkt5);
aql_dispatch(pkt6);
// This packet will notify us when 5 & 6 are complete)
aql_dispatch_with_barrier_bit(finish_pkt);
PACKET EXECUTION – BARRIER PACKET
© Copyright 2014 HSA Foundation. All Rights Reserved
Barrier T2Q2
T1Q1
Signal X
init to 1
depSignal0
completionSignal
Time
Decrements signal X
Barrier
Launch
T1
Launch
Barrier
Execute
T1
Execute
Barrier
Complete
T1
Complete
T2
Launch
T2
Execute
T2
Complete
Barrier completes
when signal X
signalled with 0
T2 launches once
barrier complete
DEPTH FIRST CHILD TASK EXECUTION
 Consider two generations of child tasks
 Task T submits tasks T.1 & T.2
 Task T.1 submits tasks T.1.1 & T.1.2
 Task T.2 submits tasks T.2.1 & T.2.2
 Desired outcome
 Depth first child task execution
 I.e. T  T1  T.1.1  T.1.2  T.2  T.2.1  T.2.2
 T passed signal (allComplete) to decrement when all tasks are complete (T and its
children etc)
© Copyright 2014 HSA Foundation. All Rights Reserved
T
T.2.2T.1.2T.1.2T.1.1
T.1 T.2
HOW TO DO THIS WITH HSA QUEUES?
 Use a separate user mode queue for each recursion level
 Task T submits to queue Q1
 Tasks T.1 & T.2 submits tasks to queue Q2
 Queues could be passed in as parameters to task T
 Depth first requires ordering of T.1, T.2 and their children
 Use additional signal object (childrenComplete) to track completion of the children of
T.1 & T.2
 childrenComplete set to number of children (i.e. 2) by each of T.1 & T.2
© Copyright 2014 HSA Foundation. All Rights Reserved
A PICTURE SAYS MORE THAN 1000 WORDS
© Copyright 2014 HSA Foundation. All Rights Reserved
T
T.2.2T.1.2T.1.2T.1.1
T.1 T.2 T.1 Barrier T.2 BarrierQ1
Wait on
childrenComplete
Signal
allComplete
T.1.1 T.1.2 T.2.1 T.2.2Q2
SUMMARY
© Copyright 2014 HSA Foundation. All Rights Reserved
KEY HSA TECHNOLOGIES
 HSA combines several mechanisms to enable low overhead task
dispatch
 Shared Virtual Memory
 System Coherency
 Signaling
 AQL
 User mode queues – from any compatible agent
 Architected packet format
 Rich dependency mechanism
 Flexible and efficient signaling of completion
© Copyright 2014 HSA Foundation. All Rights Reserved
QUESTIONS?
© Copyright 2014 HSA Foundation. All Rights Reserved

Contenu connexe

Tendances

COSCUP 2020 RISC-V 32 bit linux highmem porting
COSCUP 2020 RISC-V 32 bit linux highmem portingCOSCUP 2020 RISC-V 32 bit linux highmem porting
COSCUP 2020 RISC-V 32 bit linux highmem portingEric Lin
 
Red Hat Enterprise Linux OpenStack Platform 7 - VM Instance HA Architecture
Red Hat Enterprise Linux OpenStack Platform 7 - VM Instance HA ArchitectureRed Hat Enterprise Linux OpenStack Platform 7 - VM Instance HA Architecture
Red Hat Enterprise Linux OpenStack Platform 7 - VM Instance HA ArchitectureEtsuji Nakai
 
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019Sean Cohen
 
Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...Adrian Huang
 
Spring Boot Actuator
Spring Boot ActuatorSpring Boot Actuator
Spring Boot ActuatorRowell Belen
 
BKK16-317 How to generate power models for EAS and IPA
BKK16-317 How to generate power models for EAS and IPABKK16-317 How to generate power models for EAS and IPA
BKK16-317 How to generate power models for EAS and IPALinaro
 
LAS16-105: Walkthrough of the EAS kernel adaptation to the Android Common Kernel
LAS16-105: Walkthrough of the EAS kernel adaptation to the Android Common KernelLAS16-105: Walkthrough of the EAS kernel adaptation to the Android Common Kernel
LAS16-105: Walkthrough of the EAS kernel adaptation to the Android Common KernelLinaro
 
Apache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft LibraryApache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft LibraryTsz-Wo (Nicholas) Sze
 
High throughput data replication over RAFT
High throughput data replication over RAFTHigh throughput data replication over RAFT
High throughput data replication over RAFTDataWorks Summit
 
Trusted firmware deep_dive_v1.0_
Trusted firmware deep_dive_v1.0_Trusted firmware deep_dive_v1.0_
Trusted firmware deep_dive_v1.0_Linaro
 
BUD17-400: Secure Data Path with OPTEE
BUD17-400: Secure Data Path with OPTEE BUD17-400: Secure Data Path with OPTEE
BUD17-400: Secure Data Path with OPTEE Linaro
 
Reverse Mapping (rmap) in Linux Kernel
Reverse Mapping (rmap) in Linux KernelReverse Mapping (rmap) in Linux Kernel
Reverse Mapping (rmap) in Linux KernelAdrian Huang
 
U Boot or Universal Bootloader
U Boot or Universal BootloaderU Boot or Universal Bootloader
U Boot or Universal BootloaderSatpal Parmar
 
The Linux Kernel Scheduler (For Beginners) - SFO17-421
The Linux Kernel Scheduler (For Beginners) - SFO17-421The Linux Kernel Scheduler (For Beginners) - SFO17-421
The Linux Kernel Scheduler (For Beginners) - SFO17-421Linaro
 
Introduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission ControlIntroduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission ControlLeon Chen
 
Kernel Recipes 2015: Representing device-tree peripherals in ACPI
Kernel Recipes 2015: Representing device-tree peripherals in ACPIKernel Recipes 2015: Representing device-tree peripherals in ACPI
Kernel Recipes 2015: Representing device-tree peripherals in ACPIAnne Nicolas
 

Tendances (20)

COSCUP 2020 RISC-V 32 bit linux highmem porting
COSCUP 2020 RISC-V 32 bit linux highmem portingCOSCUP 2020 RISC-V 32 bit linux highmem porting
COSCUP 2020 RISC-V 32 bit linux highmem porting
 
Red Hat Enterprise Linux OpenStack Platform 7 - VM Instance HA Architecture
Red Hat Enterprise Linux OpenStack Platform 7 - VM Instance HA ArchitectureRed Hat Enterprise Linux OpenStack Platform 7 - VM Instance HA Architecture
Red Hat Enterprise Linux OpenStack Platform 7 - VM Instance HA Architecture
 
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
 
Spark
SparkSpark
Spark
 
Apache Spark 101
Apache Spark 101Apache Spark 101
Apache Spark 101
 
Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...
 
Spring Boot Actuator
Spring Boot ActuatorSpring Boot Actuator
Spring Boot Actuator
 
BKK16-317 How to generate power models for EAS and IPA
BKK16-317 How to generate power models for EAS and IPABKK16-317 How to generate power models for EAS and IPA
BKK16-317 How to generate power models for EAS and IPA
 
LAS16-105: Walkthrough of the EAS kernel adaptation to the Android Common Kernel
LAS16-105: Walkthrough of the EAS kernel adaptation to the Android Common KernelLAS16-105: Walkthrough of the EAS kernel adaptation to the Android Common Kernel
LAS16-105: Walkthrough of the EAS kernel adaptation to the Android Common Kernel
 
Apache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft LibraryApache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft Library
 
High throughput data replication over RAFT
High throughput data replication over RAFTHigh throughput data replication over RAFT
High throughput data replication over RAFT
 
Trusted firmware deep_dive_v1.0_
Trusted firmware deep_dive_v1.0_Trusted firmware deep_dive_v1.0_
Trusted firmware deep_dive_v1.0_
 
BUD17-400: Secure Data Path with OPTEE
BUD17-400: Secure Data Path with OPTEE BUD17-400: Secure Data Path with OPTEE
BUD17-400: Secure Data Path with OPTEE
 
Reverse Mapping (rmap) in Linux Kernel
Reverse Mapping (rmap) in Linux KernelReverse Mapping (rmap) in Linux Kernel
Reverse Mapping (rmap) in Linux Kernel
 
U Boot or Universal Bootloader
U Boot or Universal BootloaderU Boot or Universal Bootloader
U Boot or Universal Bootloader
 
The Linux Kernel Scheduler (For Beginners) - SFO17-421
The Linux Kernel Scheduler (For Beginners) - SFO17-421The Linux Kernel Scheduler (For Beginners) - SFO17-421
The Linux Kernel Scheduler (For Beginners) - SFO17-421
 
Introduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission ControlIntroduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission Control
 
Kernel Recipes 2015: Representing device-tree peripherals in ACPI
Kernel Recipes 2015: Representing device-tree peripherals in ACPIKernel Recipes 2015: Representing device-tree peripherals in ACPI
Kernel Recipes 2015: Representing device-tree peripherals in ACPI
 
Ixgbe internals
Ixgbe internalsIxgbe internals
Ixgbe internals
 
Programming guide for linux usb device drivers
Programming guide for linux usb device driversProgramming guide for linux usb device drivers
Programming guide for linux usb device drivers
 

Similaire à ISCA final presentation - Queuing Model

HSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian BrattHSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian BrattAMD Developer Central
 
HSA From A Software Perspective
HSA From A Software Perspective HSA From A Software Perspective
HSA From A Software Perspective HSA Foundation
 
ISCA Final Presentation - Intro
ISCA Final Presentation - IntroISCA Final Presentation - Intro
ISCA Final Presentation - IntroHSA Foundation
 
ISCA final presentation - Runtime
ISCA final presentation - RuntimeISCA final presentation - Runtime
ISCA final presentation - RuntimeHSA Foundation
 
HSA Introduction Hot Chips 2013
HSA Introduction  Hot Chips 2013HSA Introduction  Hot Chips 2013
HSA Introduction Hot Chips 2013HSA Foundation
 
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati..."Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...Edge AI and Vision Alliance
 
ISCA Final Presentation - HSAIL
ISCA Final Presentation - HSAILISCA Final Presentation - HSAIL
ISCA Final Presentation - HSAILHSA Foundation
 
Heterogeneous System Architecture Overview
Heterogeneous System Architecture OverviewHeterogeneous System Architecture Overview
Heterogeneous System Architecture Overviewinside-BigData.com
 
LCU13: HSA Architecture Presentation
LCU13: HSA Architecture PresentationLCU13: HSA Architecture Presentation
LCU13: HSA Architecture PresentationLinaro
 
Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)dibyendu.das
 
Introducing Apache Geode and Spring Data GemFire
Introducing Apache Geode and Spring Data GemFireIntroducing Apache Geode and Spring Data GemFire
Introducing Apache Geode and Spring Data GemFireJohn Blum
 
SAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasySAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasyDataWorks Summit
 
ISCA final presentation - Memory Model
ISCA final presentation - Memory ModelISCA final presentation - Memory Model
ISCA final presentation - Memory ModelHSA Foundation
 
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...AMD Developer Central
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Hortonworks
 
HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA HSA Foundation
 
Logging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorLogging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorCask Data
 

Similaire à ISCA final presentation - Queuing Model (20)

HSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian BrattHSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian Bratt
 
HSA From A Software Perspective
HSA From A Software Perspective HSA From A Software Perspective
HSA From A Software Perspective
 
ISCA Final Presentation - Intro
ISCA Final Presentation - IntroISCA Final Presentation - Intro
ISCA Final Presentation - Intro
 
ISCA final presentation - Runtime
ISCA final presentation - RuntimeISCA final presentation - Runtime
ISCA final presentation - Runtime
 
HSA Introduction Hot Chips 2013
HSA Introduction  Hot Chips 2013HSA Introduction  Hot Chips 2013
HSA Introduction Hot Chips 2013
 
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati..."Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
 
ISCA Final Presentation - HSAIL
ISCA Final Presentation - HSAILISCA Final Presentation - HSAIL
ISCA Final Presentation - HSAIL
 
HSA Introduction
HSA IntroductionHSA Introduction
HSA Introduction
 
Heterogeneous System Architecture Overview
Heterogeneous System Architecture OverviewHeterogeneous System Architecture Overview
Heterogeneous System Architecture Overview
 
LCU13: HSA Architecture Presentation
LCU13: HSA Architecture PresentationLCU13: HSA Architecture Presentation
LCU13: HSA Architecture Presentation
 
Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)
 
Introducing Apache Geode and Spring Data GemFire
Introducing Apache Geode and Spring Data GemFireIntroducing Apache Geode and Spring Data GemFire
Introducing Apache Geode and Spring Data GemFire
 
SAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasySAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made Easy
 
ISCA final presentation - Memory Model
ISCA final presentation - Memory ModelISCA final presentation - Memory Model
ISCA final presentation - Memory Model
 
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
 
Streaming analytics manager
Streaming analytics managerStreaming analytics manager
Streaming analytics manager
 
Hive Now Sparks
Hive Now SparksHive Now Sparks
Hive Now Sparks
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
 
HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA
 
Logging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorLogging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data Collector
 

Plus de HSA Foundation

KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPUKeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPUHSA Foundation
 
Hsa Runtime version 1.00 Provisional
Hsa Runtime version  1.00  ProvisionalHsa Runtime version  1.00  Provisional
Hsa Runtime version 1.00 ProvisionalHSA Foundation
 
Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)HSA Foundation
 
ISCA Final Presentaiton - Compilations
ISCA Final Presentaiton -  CompilationsISCA Final Presentaiton -  Compilations
ISCA Final Presentaiton - CompilationsHSA Foundation
 
ISCA Final Presentation - Applications
ISCA Final Presentation - ApplicationsISCA Final Presentation - Applications
ISCA Final Presentation - ApplicationsHSA Foundation
 
Apu13 cp lu-keynote-final-slideshare
Apu13 cp lu-keynote-final-slideshareApu13 cp lu-keynote-final-slideshare
Apu13 cp lu-keynote-final-slideshareHSA Foundation
 
HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013HSA Foundation
 
HSA HSAIL Introduction Hot Chips 2013
HSA HSAIL Introduction  Hot Chips 2013 HSA HSAIL Introduction  Hot Chips 2013
HSA HSAIL Introduction Hot Chips 2013 HSA Foundation
 
HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer HSA Foundation
 
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...HSA Foundation
 
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...HSA Foundation
 
Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012HSA Foundation
 
Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime HSA Foundation
 
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA  by Ben Sanders, AMDBolt C++ Standard Template Libary for HSA  by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMDHSA Foundation
 
Hsa2012 logo guidelines.
Hsa2012 logo guidelines.Hsa2012 logo guidelines.
Hsa2012 logo guidelines.HSA Foundation
 
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...HSA Foundation
 
What Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSAWhat Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSAHSA Foundation
 
Fabric Engine: Why HSA is Invaluable
Fabric Engine: Why HSA is  InvaluableFabric Engine: Why HSA is  Invaluable
Fabric Engine: Why HSA is InvaluableHSA Foundation
 
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
 AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.” AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”HSA Foundation
 

Plus de HSA Foundation (20)

KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPUKeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
 
Hsa Runtime version 1.00 Provisional
Hsa Runtime version  1.00  ProvisionalHsa Runtime version  1.00  Provisional
Hsa Runtime version 1.00 Provisional
 
Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)
 
ISCA Final Presentaiton - Compilations
ISCA Final Presentaiton -  CompilationsISCA Final Presentaiton -  Compilations
ISCA Final Presentaiton - Compilations
 
ISCA Final Presentation - Applications
ISCA Final Presentation - ApplicationsISCA Final Presentation - Applications
ISCA Final Presentation - Applications
 
Apu13 cp lu-keynote-final-slideshare
Apu13 cp lu-keynote-final-slideshareApu13 cp lu-keynote-final-slideshare
Apu13 cp lu-keynote-final-slideshare
 
HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013
 
HSA HSAIL Introduction Hot Chips 2013
HSA HSAIL Introduction  Hot Chips 2013 HSA HSAIL Introduction  Hot Chips 2013
HSA HSAIL Introduction Hot Chips 2013
 
HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer
 
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
 
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
 
Hsa10 whitepaper
Hsa10 whitepaperHsa10 whitepaper
Hsa10 whitepaper
 
Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012
 
Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime
 
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA  by Ben Sanders, AMDBolt C++ Standard Template Libary for HSA  by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
 
Hsa2012 logo guidelines.
Hsa2012 logo guidelines.Hsa2012 logo guidelines.
Hsa2012 logo guidelines.
 
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
 
What Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSAWhat Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSA
 
Fabric Engine: Why HSA is Invaluable
Fabric Engine: Why HSA is  InvaluableFabric Engine: Why HSA is  Invaluable
Fabric Engine: Why HSA is Invaluable
 
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
 AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.” AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
 

Dernier

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Dernier (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

ISCA final presentation - Queuing Model

  • 1. HSA QUEUING MODEL HAKAN PERSSON, SENIOR PRINCIPAL ENGINEER, ARM
  • 3. MOTIVATION (TODAY’S PICTURE) © Copyright 2014 HSA Foundation. All Rights Reserved Application OS GPU Transfer buffer to GPU Copy/Map Memory Queue Job Schedule Job Start Job Finish Job Schedule Application Get Buffer Copy/Map Memory
  • 5. REQUIREMENTS  Three key technologies are used to build the user mode queueing mechanism  Shared Virtual Memory  System Coherency  Signaling  AQL (Architected Queueing Language) enables any agent enqueue tasks © Copyright 2014 HSA Foundation. All Rights Reserved
  • 7. PHYSICAL MEMORY SHARED VIRTUAL MEMORY (TODAY)  Multiple Virtual memory address spaces © Copyright 2014 HSA Foundation. All Rights Reserved CPU0 GPU VIRTUAL MEMORY1 PHYSICAL MEMORY VA1->PA1 VA2->PA1 VIRTUAL MEMORY2
  • 8. PHYSICAL MEMORY SHARED VIRTUAL MEMORY (HSA)  Common Virtual Memory for all HSA agents © Copyright 2014 HSA Foundation. All Rights Reserved CPU0 GPU VIRTUAL MEMORY PHYSICAL MEMORY VA->PA VA->PA
  • 9. SHARED VIRTUAL MEMORY  Advantages  No mapping tricks, no copying back-and-forth between different PA addresses  Send pointers (not data) back and forth between HSA agents.  Implications  Common Page Tables (and common interpretation of architectural semantics such as shareability, protection, etc).  Common mechanisms for address translation (and servicing address translation faults)  Concept of a process address space (PASID) to allow multiple, per process virtual address spaces within the system. © Copyright 2014 HSA Foundation. All Rights Reserved
  • 10. SHARED VIRTUAL MEMORY  Specifics  Minimum supported VA width is 48b for 64b systems, and 32b for 32b systems.  HSA agents may reserve VA ranges for internal use via system software.  All HSA agents other than the host unit must use the lowest privilege level  If present, read/write access flags for page tables must be maintained by all agents.  Read/write permissions apply to all HSA agents, equally. © Copyright 2014 HSA Foundation. All Rights Reserved
  • 11. GETTING THERE … © Copyright 2014 HSA Foundation. All Rights Reserved Application OS GPU Transfer buffer to GPU Copy/Map Memory Queue Job Schedule Job Start Job Finish Job Schedule Application Get Buffer Copy/Map Memory
  • 13. CACHE COHERENCY DOMAINS (1/3)  Data accesses to global memory segment from all HSA Agents shall be coherent without the need for explicit cache maintenance. © Copyright 2014 HSA Foundation. All Rights Reserved
  • 14. CACHE COHERENCY DOMAINS (2/3)  Advantages  Composability  Reduced SW complexity when communicating between agents  Lower barrier to entry when porting software  Implications  Hardware coherency support between all HSA agents  Can take many forms  Stand alone Snoop Filters / Directories  Combined L3/Filters  Snoop-based systems (no filter)  Etc … © Copyright 2014 HSA Foundation. All Rights Reserved
  • 15. CACHE COHERENCY DOMAINS (3/3)  Specifics  No requirement for instruction memory accesses to be coherent  Only applies to the Primary memory type.  No requirement for HSA agents to maintain coherency to any memory location where the HSA agents do not specify the same memory attributes  Read-only image data is required to remain static during the execution of an HSA kernel.  No double mapping (via different attributes) in order to modify. Must remain static © Copyright 2014 HSA Foundation. All Rights Reserved
  • 16. GETTING CLOSER … © Copyright 2014 HSA Foundation. All Rights Reserved Application OS GPU Transfer buffer to GPU Copy/Map Memory Queue Job Schedule Job Start Job Finish Job Schedule Application Get Buffer Copy/Map Memory
  • 18. SIGNALING (1/3)  HSA agents support the ability to use signaling objects  All creation/destruction signaling objects occurs via HSA runtime APIs  From an HSA Agent you can directly access signaling objects.  Signaling a signal object (this will wake up HSA agents waiting upon the object)  Query current object  Wait on the current object (various conditions supported). © Copyright 2014 HSA Foundation. All Rights Reserved
  • 19. SIGNALING (2/3)  Advantages  Enables asynchronous events between HSA agents, without involving the kernel  Common idiom for work offload  Low power waiting  Implications  Runtime support required  Commonly implemented on top of cache coherency flows © Copyright 2014 HSA Foundation. All Rights Reserved
  • 20. SIGNALING (3/3)  Specifics  Only supported within a PASID  Supported wait conditions are =, !=, < and >=  Wait operations may return sporadically (no guarantee against false positives)  Programmer must test.  Wait operations have a maximum duration before returning.  The HSAIL atomic operations are supported on signal objects.  Signal objects are opaque  Must use dedicated HSAIL/HSA runtime operations © Copyright 2014 HSA Foundation. All Rights Reserved
  • 21. ALMOST THERE… © Copyright 2014 HSA Foundation. All Rights Reserved Application OS GPU Transfer buffer to GPU Copy/Map Memory Queue Job Schedule Job Start Job Finish Job Schedule Application Get Buffer Copy/Map Memory
  • 23. ONE BLOCK LEFT © Copyright 2014 HSA Foundation. All Rights Reserved Application OS GPU Transfer buffer to GPU Copy/Map Memory Queue Job Schedule Job Start Job Finish Job Schedule Application Get Buffer Copy/Map Memory
  • 24. USER MODE QUEUEING (1/3)  User mode Queueing  Enables user space applications to directly, without OS intervention, enqueue jobs (“Dispatch Packets”) for HSA agents.  Queues are created/destroyed via calls to the HSA runtime.  One (or many) agents enqueue packets, a single agent dequeues packets.  Requires coherency and shared virtual memory. © Copyright 2014 HSA Foundation. All Rights Reserved
  • 25. USER MODE QUEUEING (2/3)  Advantages  Avoid involving the kernel/driver when dispatching work for an Agent.  Lower latency job dispatch enables finer granularity of offload  Standard memory protection mechanisms may be used to protect communication with the consuming agent.  Implications  Packet formats/fields are Architected – standard across vendors!  Guaranteed backward compatibility  Packets are enqueued/dequeued via an Architected protocol (all via memory accesses and signaling)  More on this later…… © Copyright 2014 HSA Foundation. All Rights Reserved
  • 26. SUCCESS! © Copyright 2014 HSA Foundation. All Rights Reserved Application OS GPU Transfer buffer to GPU Copy/Map Memory Queue Job Schedule Job Start Job Finish Job Schedule Application Get Buffer Copy/Map Memory
  • 27. SUCCESS! © Copyright 2014 HSA Foundation. All Rights Reserved Application OS GPU Queue Job Start Job Finish Job
  • 29. ARCHITECTED QUEUEING LANGUAGE  HSA Queues look just like standard shared memory queues, supporting multi-producer, single-consumer  Single producer variant defined with some optimizations possible.  Queues consist of storage, read/write indices, ID, etc.  Queues are created/destroyed via calls to the HSA runtime  “Packets” are placed in queues directly from user mode, via an architected protocol  Packet format is architected © Copyright 2014 HSA Foundation. All Rights Reserved Producer Producer Consumer Read Index Write Index Storage in coherent, shared memory Packets
  • 30. ARCHITECTED QUEUING LANGUAGE  Packets are read and dispatched for execution from the queue in order, but may complete in any order.  There is no guarantee that more than one packet will be processed in parallel at a time  There may be many queues. A single agent may also consume from several queues.  Any HSA agent may enqueue packets  CPUs  GPUs  Other accelerators © Copyright 2014 HSA Foundation. All Rights Reserved
  • 31. QUEUE STRUCTURE © Copyright 2014 HSA Foundation. All Rights Reserved Offset (bytes) Size (bytes) Field Notes 0 4 queueType Differentiate different queues 4 4 queueFeatures Indicate supported features 8 8 baseAddress Pointer to packet array 16 16 doorbellSignal HSA signaling object handle 24 4 size Packet array cardinality 28 4 queueId Unique per process 32 8 serviceQueue Queue for callback services intrinsic 8 writeIndex Packet array write index intrinsic 8 readIndex Packet array read index
  • 32. QUEUE VARIANTS  queueType and queueFeatures together define queue semantics and capabilities  Two queueType values defined, other values reserved:  MULTI – queue supports multiple producers  SINGLE – queue supports single producer  queueFeatures is a bitfield indicating capabilities  DISPATCH (bit 0) if set then queue supports DISPATCH packets  AGENT_DISPATCH (bit 1) if set then queue supports AGENT_DISPATCH packets  All other bits are reserved and must be 0 © Copyright 2014 HSA Foundation. All Rights Reserved
  • 33. QUEUE STRUCTURE DETAILS  Queue doorbells are HSA signaling objects with restrictions  Created as part of the queue – lifetime tied to queue object  Atomic read-modify-write not allowed  size field value must be aligned to a power of 2  serviceQueue can be used by HSA kernel for callback services  Provided by application when queue is created  Can be mapped to HSA runtime provided serviceQueue, an application serviced queue, or NULL if no serviceQueue required © Copyright 2014 HSA Foundation. All Rights Reserved
  • 34. READ/WRITE INDICES  readIndex and writeIndex properties are part of the queue, but not visible in the queue structure  Accessed through HSA runtime API and HSAIL operations  HSA runtime/HSAIL operations defined to  Read readIndex or writeIndex property  Write readIndex or writeIndex property  Add constant to writeIndex property (returns previous writeIndex value)  CAS on writeIndex property  readIndex & writeIndex operations treated as atomic in memory model  relaxed, acquire, release and acquire-release variants defined as applicable  readIndex and writeIndex never wrap  PacketID – the index of a particular packet  Uniquely identifies each packet of a queue © Copyright 2014 HSA Foundation. All Rights Reserved
  • 35. PACKET ENQUEUE  Packet enqueue follows a few simple steps:  Reserve space  Multiple packets can be reserved at a time  Write packet to queue  Mark packet as valid  Producer no longer allowed to modify packet  Consumer is allowed to start processing packet  Notify consumer of packet through the queue doorbell  Multiple packets can be notified at a time  Doorbell signal should be signaled with last packetID notified  On small machine model the lower 32 bits of the packetID are used © Copyright 2014 HSA Foundation. All Rights Reserved
  • 36. PACKET RESERVATION  Two flows envisaged  Atomic add writeIndex with number of packets to reserve  Producer must wait until packetID < readIndex + size before writing to packet  Queue can be sized so that wait is unlikely (or impossible)  Suitable when many threads use one queue  Check queue not full first, then use atomic CAS to update writeIndex  Can be inefficient if many threads use the same queue  Allows different failure model if queue is congested © Copyright 2014 HSA Foundation. All Rights Reserved
  • 37. QUEUE OPTIMIZATIONS  Queue behavior is loosely defined to allow optimizations  Some potential producer behavior optimizations:  Keep local copy of readIndex, update when required  For single producer queues:  Keep local copy of writeIndex  Use store operation rather than add/cas atomic to update writeIndex  Some potential consumer behavior optimizations:  Use packet format field to determine whether a packet has been submitted rather than writeIndex property  Speculatively read multiple packets from the queue  Not update readIndex for each packet processed  Rely on value used for doorbellSignal to notify new packets  Especially useful for single producer queues © Copyright 2014 HSA Foundation. All Rights Reserved
  • 38. POTENTIAL MULTI-PRODUCER ALGORITHM // Allocate packet uint64_t packetID = hsa_queue_add_write_index_relaxed(q, 1); // Wait until the queue is no longer full. uint64_t rdIdx; do { rdIdx = hsa_queue_load_read_index_relaxed(q); } while (packetID >= (rdIdx + q->size)); // calculate index uint32_t arrayIdx = packetID & (q->size-1); // copy over the packet, the format field is INVALID q->baseAddress[arrayIdx] = pkt; // Update format field with release semantics q->baseAddress[index].hdr.format.store(DISPATCH, std::memory_order_release); // ring doorbell, with release semantics (could also amortize over multiple packets) hsa_signal_send_relaxed(q->doorbellSignal, packetID); © Copyright 2014 HSA Foundation. All Rights Reserved
  • 39. POTENTIAL CONSUMER ALGORITHM // Get location of next packet uint64_t readIndex = hsa_queue_load_read_index_relaxed(q); // calculate the index uint32_t arrayIdx = readIndex & (q->size-1); // spin while empty (could also perform low-power wait on doorbell) while (INVALID == q->baseAddress[arrayIdx].hdr.format) { } // copy over the packet pkt = q->baseAddress[arrayIdx]; // set the format field to invalid q->baseAddress[arrayIdx].hdr.format.store(INVALID, std::memory_order_relaxed); // Update the readIndex using HSA intrinsic hsa_queue_store_read_index_relaxed(q, readIndex+1); // Now process <pkt>! © Copyright 2014 HSA Foundation. All Rights Reserved
  • 41. PACKETS © Copyright 2014 HSA Foundation. All Rights Reserved  Packets come in three main types with architected layouts  Always reserved & Invalid  Do not contain any valid tasks and are not processed (queue will not progress)  Dispatch  Specifies kernel execution over a grid  Agent Dispatch  Specifies a single function to perform with a set of parameters  Barrier  Used for task dependencies
  • 42. COMMON PACKET HEADER Start Offset (Bytes) Format Field Name Description 0 uint16_t format:8 Contains the packet type (Always reserved, Invalid, Dispatch, Agent Dispatch, and Barrier). Other values are reserved and should not be used. barrier:1 If set then processing of packet will only begin when all preceding packets are complete. acquireFenceScope:2 Determines the scope and type of the memory fence operation applied before the packet enters the active phase. Must be 0 for Barrier Packets. releaseFenceScope:2 Determines the scope and type of the memory fence operation applied after kernel completion but before the packet is completed. reserved:3 Must be 0 © Copyright 2014 HSA Foundation. All Rights Reserved
  • 43. DISPATCH PACKET © Copyright 2014 HSA Foundation. All Rights Reserved Start Offset (Bytes) Format Field Name Description 0 uint16_t header Packet header 2 uint16_t dimensions:2 Number of dimensions specified in gridSize. Valid values are 1, 2, or 3. reserved:14 Must be 0. 4 uint16_t workgroupSize.x x dimension of work-group (measured in work-items). 6 uint16_t workgroupSize.y y dimension of work-group (measured in work-items). 8 uint16_t workgroupSize.z z dimension of work-group (measured in work-items). 10 uint16_t reserved2 Must be 0. 12 uint32_t gridSize.x x dimension of grid (measured in work-items). 16 uint32_t gridSize.y y dimension of grid (measured in work-items). 20 uint32_t gridSize.z z dimension of grid (measured in work-items). 24 uint32_t privateSegmentSizeBytes Total size in bytes of private memory allocation request (per work-item). 28 uint32_t groupSegmentSizeBytes Total size in bytes of group memory allocation request (per work-group). 32 uint64_t kernelObjectAddress Address of an object in memory that includes an implementation-defined executable ISA image for the kernel. 40 uint64_t kernargAddress Address of memory containing kernel arguments. 48 uint64_t reserved3 Must be 0. 56 uint64_t completionSignal Address of HSA signaling object used to indicate completion of the job.
  • 44. AGENT DISPATCH PACKET © Copyright 2014 HSA Foundation. All Rights Reserved Start Offset (Bytes) Format Field Name Description 0 uint16_t header Packet header 2 uint16_t type The function to be performed by the destination Agent. The type value is split into the following ranges:  0x0000:0x3FFF – Vendor specific  0x4000:0x7FFF – HSA runtime  0x8000:0xFFFF – User registered function 4 uint32_t reserved2 Must be 0. 8 uint64_t returnLocation Pointer to location to store the function return value in. 16 uint64_t arg[0] 64-bit direct or indirect arguments. 24 uint64_t arg[1] 32 uint64_t arg[2] 40 uint64_t arg[3] 48 uint64_t reserved3 Must be 0. 56 uint64_t completionSignal Address of HSA signaling object used to indicate completion of the job.
  • 45. BARRIER PACKET  Used for specifying dependences between packets  HSA agent will not launch any further packets from this queue until the barrier packet signal conditions are met  Used for specifying dependences on packets dispatched from any queue.  Execution phase completes only when all of the dependent signals (up to five) have been signaled (with the value of 0).  Or if an error has occurred in one of the packets upon which we have a dependence. © Copyright 2014 HSA Foundation. All Rights Reserved
  • 46. BARRIER PACKET © Copyright 2014 HSA Foundation. All Rights Reserved Start Offset (Bytes) Format Field Name Description 0 uint16_t header Packet header, see 2.8.1 Packet header (p. 16). 2 uint16_t reserved2 Must be 0. 4 uint32_t reserved3 Must be 0. 8 uint64_t depSignal0 Address of dependent signaling objects to be evaluated by the packet processor. 16 uint64_t depSignal1 24 uint64_t depSignal2 32 uint64_t depSignal3 40 uint64_t depSignal4 48 uint64_t reserved4 Must be 0. 56 uint64_t completionSignal Address of HSA signaling object used to indicate completion of the job.
  • 47. DEPENDENCES  A user may never assume more than one packet is being executed by an HSA agent at a time.  Implications:  Packets can’t poll on shared memory values which will be set by packets issued from other queues, unless the user has ensured the proper ordering.  To ensure all previous packets from a queue have been completed, use the Barrier bit.  To ensure specific packets from any queue have completed, use the Barrier packet. © Copyright 2014 HSA Foundation. All Rights Reserved
  • 48. HSA QUEUEING, PACKET EXECUTION
  • 49. PACKET EXECUTION  Launch phase  Initiated when launch conditions are met  All preceding packets in the queue must have exited launch phase  If the barrier bit in the packet header is set, then all preceding packets in the queue must have exited completion phase  Includes memory acquire fence  Active phase  Execute the packet  Barrier packets remain in Active phase until conditions are met.  Completion phase  First step is memory release fence – make results visible.  completionSignal field is then signaled with a decrementing atomic. © Copyright 2014 HSA Foundation. All Rights Reserved
  • 50. PACKET EXECUTION – BARRIER BIT © Copyright 2014 HSA Foundation. All Rights Reserved Pkt1 Launch Pkt2 Launch Pkt1 Execute Pkt2 Execute Pkt1 Complete Pkt3 Launch (barrier=1) Pkt2 Complete Pkt3 Execute Time Pkt3 launches whenall packets in the queue have completed.
  • 51. PUTTING IT ALL TOGETHER (FFT) © Copyright 2014 HSA Foundation. All Rights Reserved Packet 1 Packet 2 Packet 3 Packet 4 Packet 5 Packet 6 Barrier Barrier X[0] X[1] X[2] X[3] X[4] X[5] X[6] X[7] Time
  • 52. PUTTING IT ALL TOGETHER © Copyright 2014 HSA Foundation. All Rights Reserved AQL Pseudo Code // Send the packets to do the first stage. aql_dispatch(pkt1); aql_dispatch(pkt2); // Send the next two packets, setting the barrier bit so we // know packets 1 & 2 will be complete before 3 and 4 are // launched. aql_dispatch_with _barrier_bit(pkt3); aql_dispatch(pkt4); // Same as above (make sure 3 & 4 are done before issuing 5 // & 6) aql_dispatch_with_barrier_bit(pkt5); aql_dispatch(pkt6); // This packet will notify us when 5 & 6 are complete) aql_dispatch_with_barrier_bit(finish_pkt);
  • 53. PACKET EXECUTION – BARRIER PACKET © Copyright 2014 HSA Foundation. All Rights Reserved Barrier T2Q2 T1Q1 Signal X init to 1 depSignal0 completionSignal Time Decrements signal X Barrier Launch T1 Launch Barrier Execute T1 Execute Barrier Complete T1 Complete T2 Launch T2 Execute T2 Complete Barrier completes when signal X signalled with 0 T2 launches once barrier complete
  • 54. DEPTH FIRST CHILD TASK EXECUTION  Consider two generations of child tasks  Task T submits tasks T.1 & T.2  Task T.1 submits tasks T.1.1 & T.1.2  Task T.2 submits tasks T.2.1 & T.2.2  Desired outcome  Depth first child task execution  I.e. T  T1  T.1.1  T.1.2  T.2  T.2.1  T.2.2  T passed signal (allComplete) to decrement when all tasks are complete (T and its children etc) © Copyright 2014 HSA Foundation. All Rights Reserved T T.2.2T.1.2T.1.2T.1.1 T.1 T.2
  • 55. HOW TO DO THIS WITH HSA QUEUES?  Use a separate user mode queue for each recursion level  Task T submits to queue Q1  Tasks T.1 & T.2 submits tasks to queue Q2  Queues could be passed in as parameters to task T  Depth first requires ordering of T.1, T.2 and their children  Use additional signal object (childrenComplete) to track completion of the children of T.1 & T.2  childrenComplete set to number of children (i.e. 2) by each of T.1 & T.2 © Copyright 2014 HSA Foundation. All Rights Reserved
  • 56. A PICTURE SAYS MORE THAN 1000 WORDS © Copyright 2014 HSA Foundation. All Rights Reserved T T.2.2T.1.2T.1.2T.1.1 T.1 T.2 T.1 Barrier T.2 BarrierQ1 Wait on childrenComplete Signal allComplete T.1.1 T.1.2 T.2.1 T.2.2Q2
  • 57. SUMMARY © Copyright 2014 HSA Foundation. All Rights Reserved
  • 58. KEY HSA TECHNOLOGIES  HSA combines several mechanisms to enable low overhead task dispatch  Shared Virtual Memory  System Coherency  Signaling  AQL  User mode queues – from any compatible agent  Architected packet format  Rich dependency mechanism  Flexible and efficient signaling of completion © Copyright 2014 HSA Foundation. All Rights Reserved
  • 59. QUESTIONS? © Copyright 2014 HSA Foundation. All Rights Reserved