Contenu connexe Similaire à Massively Parallel RISC-V Processing with Transactional Memory (20) Massively Parallel RISC-V Processing with Transactional Memory1. © 2018 NETRONOME SYSTEMS, INC. 1
December 3 - 6, 2018
Santa Clara Convention Center
CA, USA
REVOLUTIONIZING
THE COMPUTING
LANDSCAPE AND
BEYOND.
https://tmt.knect365.com/risc-v-summit
@risc_v
2. © 2018 NETRONOME SYSTEMS, INC. 2
Steven Zagorianakos
VP Silicon Development
Netronome
MASSIVELY PARALLEL
RISC-V PROCESSING WITH
TRANSACTIONAL MEMORY
https://tmt.knect365.com/risc-v-summit
@risc_v
3. © 2018 NETRONOME SYSTEMS, INC. 3
Introduction
• Discuss Transaction Memories
• Walk Through an Example Implementation, Utilizing Transactional Memories and
RISC-V Harts
• Full Chip, Island, Cluster and Groups of RISC-V Harts
• RISC-V Feature Set for RFPC
• Summary
4. © 2018 NETRONOME SYSTEMS, INC. 5
“Transactional Memory”
But still running in arbitrary
C code of any size ...
Instruction-Driven
Switch Fabric
• Transactional Memory
Hierarchy
▶ Memory
▶ Closely coupled
▶ Threaded processing
engines
▶ And hardwired transaction
types
▶ Atomics
▶ CRC
▶ Crypto
• Many, Many CPU Cores
• Require
▶ Many Cores
▶ Efficient Command Dispatch /
Fetch / Result / Synchronization
• (Not interrupt based
for example…)!
▶ WFE
▶ Currently Planned
as Custom-1
5. © 2018 NETRONOME SYSTEMS, INC. 6
A Practical Implementation
RFPC
Island
(~100 Cores)
RFPC
Island
(~100 Cores)
RFPC
Island
(~100 Cores)
RFPC
Island
(~100 Cores)
RFPC
Island
(~100 Cores)
RFPC
Island
(~100 Cores)
SRAM
Memory
Island
RFPC
Island
(~100 Cores)
SRAM
Memory
Island
SRAM
Memory
Island
DRAM-Backed
Memory Island
SRAM
Memory
Island
Host
Interface
Island
DRAM Cache
Config
Island
Expansion
Island
Network
Interface
Island
Host
Memory
Host
• The chip or chiplet is made up
of islands, which are connected
through the instruction-driven
switch fabric
• Which allows for implement-
tation from small to large
• Memory hierarchy provides
equal access to all types of
memories
• The config, host interface, and
network interface islands allow
for feeding data into the system
• Basic flow of data in a
SmartNIC
6. © 2018 NETRONOME SYSTEMS, INC. 7
RFPC Island
RFPC Cluster
(Many RFPC Cores)
RFPC Cluster
(Many RFPC Cores)
RFPC Cluster
(Many RFPC Cores)
Local
Scratch
Memory
Config/Island
Bridge
Tile Link
to Island Bus
Agent
Slice Cache
Global Bus
Island
Bus
Transactional
Memory Ops
Datapath: Posted
Coprocessor and
Memory Transactions
Caching Data/
Instructions, C Memory
Structures, etc.
Island Bus
Remote-Cache Coherency Ops
Tile Link
Tile Link
Tile Link
Slice Cache
Slice Cache
7. © 2018 NETRONOME SYSTEMS, INC. 8
RFPC Cluster
RFPC Group
(~10 Cores)
Transactional
Memory Ops
Tile Link
Interface
Manages
Binding
Local
Prefetch/Write
Buffer
Island Bus
interface
RFPC Group
(~10 Cores)
RFPC Group
(~10 Cores)
Island Bus
interface
RFPC Group
(~10 Cores)
Load
Store
Island
Bus
Caching Data/
Instructions, C Memory
Structures, etc.
Datapath: Posted
Coprocessor
and Memory
Transactions
Tile Link
Load
Store
Island
Bus
Datapath: Posted
Coprocessor
and Memory
Transactions
Remote-Cache
Coherency Ops
8. © 2018 NETRONOME SYSTEMS, INC. 9
RFPC Group
RFPC Core
RFPC Group
Coproc
(Multiply +)
Signals /
Timers
RISC-V
Pipeline
Several Cores
Per RFPC Group
Internal Cmd/
Atomic/
Prefetch/
Write Buffer
Transactional
Memory Ops
Remote-Cache
Coherency Ops
Local Shared
Memory
Code, High-Speed
Thread-Local
Data Structures
Data
Prefetch/Write
Buffer
Instruction
Fetch
9. © 2018 NETRONOME SYSTEMS, INC. 10
RISC-V Feature Set for RFPC
RFPC Cores are RV32IMC cores with custom-0/1 instructions
RV32IMC keeps the performance high with low silicon gate count; support for User, Machine and Debug modes only, but
provides some memory protection and both user-level and machine-level interrupts.
Custom-0 instructions permit dynamic binding of 48+-bit host address and bulk DDR addresses to 32-bit RISC-V addresses
Custom-1 instructions permit transaction memory and signaling operations
RFPC Cores collected into RFPC groups
Sharing local memory, which is directly accessed (not cache)
Simple address translation permits core-local data and stack without changing code and register initialization values
RFPC Groups collected into RFPC Clusters
Transaction initiation and signal handling (for transaction acceptance/completion) are handled also in the island bus
interfaces.
Island bus access through a shared memory, and local transactional (atomic pipeline) memory shared within the cluster
only. Non-transactional access to the cache slices
RFPC Clusters collected together
RISC-V Debug module shared amongst 40 cores - permits JTAG-based debugging of every core
The slices of cache combine as ‘L2’ cache
Provides windowing to 48-bit PCIe and 40-bit MU address spaces
RFPC is size and performance optimized
10. © 2018 NETRONOME SYSTEMS, INC. 11
Summary
• RISC-V harts are well suited for the processor required for implementing a
thousand CPU Smart-NIC.
• The RISC-V solutions can be tailored to meet the needs for embedded
applications with suitable choice of instruction set features, privileged
modes and debug methodology.
• We covered at a high level the organization of memories and RISC-V harts
that provides efficient processing with high latency memory transactions
• We looked at the instruction set customizations that allow this to handle
RISC-V hart interaction with the memory systems and other harts
11. © 2018 NETRONOME SYSTEMS, INC. 12
ODSA Workgroup
Implementing open specifications contributed by participating
companies, any vendor’s silicon die can become a building
block that can be utilized in a chiplet-based SoC design
Working together to standardize processors, accelerators,
and memory and I/O peripherals using optimal process nodes
Companies wishing to learn more, participate and become an integral part
of the ODSA Workgroup can inquire further at odsa@netronome.com or visit us
in booth #407!
12. © 2018 NETRONOME SYSTEMS, INC. 13
THANK
YOU
https://tmt.knect365.com/risc-v-summit
@risc_v