These slides are part of a "Trends in Memory Desegregation" Webinar published in March 2021. You can see the webinar recording here https://youtu.be/g0QEX5qE8kE.
The presentation slides show how the Open Memory Interface, OMI , is a critical System Architecture building block towards our industry being able to easily build Domain Specific Architectures of the future as defined by the gods of Computing Architecture John Hennessy and David Patterson.
Guwahati Escorts Service Girl ^ 9332606886, WhatsApp Anytime Guwahati
OMI - The Missing Piece of a Modular, Flexible and Composable Computing World
1. Allan Cantle - 3/25/2021
a.cantle@nallasway.com
OMI - Open Memory Interface
The Missing Piece of a Disaggregated
Modular, Flexible & Composable Computing World
2. Ethernet
or
PCIe
with
future
support
for
CXL
Resources are allocated & managed
with composer software, allocating
appropriate resources to each virtual
machine driven Workload
What is Disaggregation?
An attempt to free up Stranded Resources for di
ff
ering Workloads
Disaggregating Local DDR Memory
into Memory Pools has
proven to be more challenging
.
Latency Critical
.
RAS Issues
A A A A
A
A A A
C
C
C
C
C
C
C
C
M
M
M
M
M
M
M
M
S
S
S
S
S
S
S
S
IO
IO
IO
IO
IO
IO
IO
IO
Rack Scale Disaggregation
S
C M
M
M
C
IO IO
A
S S
S S S
S S
Classic Server with a
Typical Converged Infrastructure
A
IO
M = Memory
= Accelerator
= Input/Output C = Compute
S = Storage • Downsides of Rack Scale Disaggregation
• Increased Power through more data
movement
• Increased stress on Network in order
to enable software composability
Focus of this Webinar
3. C C C C
M M M M
What is Disaggregation?
An attempt to free up Stranded Resources for di
ff
ering Workloads
A A A A
A
A A A
M M M M
S
S
S
S
S
S
S
S
IO
IO
IO
IO
IO
IO
IO
IO
Rack Scale Disaggregation
S
C M
M
M
C
IO IO
A
S S
S S S
S S
Classic Server with a
Typical Converged Infrastructure
A
IO
M = Memory
= Accelerator
= Input/Output C = Compute
S = Storage
Focus of this Webinar
C C C C
Memory Inception
over OpenCAPI
Thymesis Flow
4. Why do Workloads differ in resource needs?
Di
ff
ering Compute ratios - Ops : Bytes/sec : Bytes Capacity
Workload Type Processor Centric Balanced (Classic) Data Centric
Compute Ratio 100 : 1 : 0.01 1 : 1 : 1 0.01 : 1 : 100
Architectural Example
M = HBM M = DRAM
A M C M
S
S
S
S S
C
S
IO IO IO
S
If you Compose a
Computer to
Application A’s
exact needs
——>
you now have a
“Domain Speci
f
ic
Architecture”
Building
Application A’s
exact architecture
requires Modularity,
Flexibility &
Composability
EVERYWHERE
IO
IO
Algo 1
100 : 1 : 0.01
Algo 2
1 : 1 : 1 Algo 3
0.01 : 1 : 100
Algo 4
100 : 1 : 0.01
Applications often have di
ff
ering
workload Algorithms running in
parallel with di
ff
ering IO bandwidth
between them
IO
IO
IO
IO
IO
Application A
The Compute Ratio
Hierarchically repeats itself
in a Fractal like manner
IO
5. Disaggregation without the Downsides
Physically Modular, Flexible and Composable
IO
IO
Algo 1
100 : 1 : 0.01
Algo 2
1 : 1 : 1 Algo 3
0.01 : 1 : 100
Algo 4
100 : 1 : 0.01
IO
IO
IO
IO
IO
IO
Application A
• Lego Block construction of Compute Ratios at the system level
• Domain Speci
f
ic Implementations could be quickly con
f
igured and tested
• Dense Modularity & Distributed Computing minimize data movement power
6. Disaggregation without the Downsides
Physically Modular, Flexible and Composable
Application A
• Lego Block construction of Compute Ratios at the system level
• Domain Speci
f
ic Implementations could be quickly con
f
igured and tested
• Dense Modularity & Distributed Computing minimize data movement power
7. DDR a Parallel Interface in Serial World
Unsuitable for physically Composable Systems
• Network, Memory, Media modules & IO use Common Serial EDSFF Interconnect
OCP - NIC 3.0
SNIA - E1.S & E3.S
Typically < 100W
S
IO
OMI DDIMM
M
IO
M
DDR DIMM
8. OMI DDIMM Overview
OMI O
ff
ers far more than just Composability
• <10ns Latency Adder over a standard DDR4 RDIMM
• In production since mid 2019 from Samsung, Smart Modular & Micron
• Memory Technology Agnostic - e.g. Easy processor migration from DDR4 to DDR5
• Up to 8x more memory bandwidth per processor - (2x BW/DDIMM + 4x No. Channels)
• Ecosystem Enablement with FPGA Host and Buffer Bringup Platform
• Fully Open Sourced IP for Host Controller and OMI Memory Buffer (RTL in Github)
• Functional and Memory Traf
f
ic Generator IP for testing purposes
16GB / 32GB - 1U High 64GB - 2U High 128GB - 3U High
32GB 3200 DDR4 DIMM (reference)
OMI DDIMMs Formats
8 Industry Standard
DDR4 Channels
8 OMI Channels
(Equivalent Pin
Comparison)
Apollo - FPGA Host Controller Bringup Platform
Gemini - OMI
DDIMM with
FPGA Bu
ff
er
9. IBM - POWER10 Die
The OMI Advantage
Memory Bandwidth AND Capacity at LOW Cost
To Scale = 20pts : 1mm
AMD - EPYC Rome IO Die
4x (2x2) DDR4 3200
DIMM Channels
= 102GBytes/s
8x (8x1) OMI 25G
DDIMM Channels
= 400GB/s
8x (8x1) OMI 25G
DDIMM Channels
= 400GB/s 1x HBM2
8x (8x1) Channels
= 311GB/s
HBM2
HBM2
HBM2
HBM2
HBM2
HBM2
NVidia - Ampere Die
10. Fully Composable Compute Node Module
Leveraged from OCP’s OAM Module - nicknamed OAM
-
HPC
• FPGA Main Processor Example
• supports many composable Memory / Storage / IO con
f
igurations
22x 2C Connectors
For Memory, Storage &/or IO
Xilinx VU37P or VU13P
OAM
-
HPC Module Top View
EDSFF TA
-
1002 Interconnect
OAM
-
HPC Module Bottom View
OAM
-
HPC Module Bottom View
with many IO, Memory & Media
Con
f
igurations
11. IBM POWER 10 OAM
-
HPC Module
Leveraged from OCP’s OAM Module - nicknamed OAM
-
HPC
• Form Factor con
f
igurable for IBM POWER 10 Processor
• Already supports the composable OMI DDIMM
• Cabled IO for OpenCAPI, SMP, and/or PCIe
22x 2C Connectors
For Memory, Storage &/or IO
IBM POWER 10
OAM
-
HPC Module Top View
EDSFF TA
-
1002 Interconnect
OAM
-
HPC Module Bottom View
OAM
-
HPC Module Bottom View
Populated with
16x OMI DDIMMs + Cabled IO