SlideShare une entreprise Scribd logo
1  sur  38
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 1 of 38
AMD Radeon™ RX 5700 Series
7nm Energy-Efficient
High-Performance GPUs
Sal Dasgupta1,Teja Singh2, Ashish Jain2, Samuel Naffziger3, Deepesh
John2, Chetan Bisht4, Pradeep Jayaraman1, Michael Mantor4
1AMD, Santa Clara, CA, 2AMD, Austin, TX, 3AMD, Fort Collins, CO, 4AMD, Orlando, FL
Presented at ISSCC 2020
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 2 of 38
Outline
• Overview of AMD Radeon™ RX 5700 Series
• AMD RDNA Architecture
• Power Management features
• GDDR6 (G6) PHY
• Physical design
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 3 of 38
AMD Radeon™ RX 5000 series
• GPUs are everywhere
• GPUs need to service a wide range of form factors and workloads
• Fundamental challenge is to get higher and higher performance at
lower and lower power
PC Gaming Content
Creation
Console
Gaming
Cloud
Gaming
Mobile
Devices
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 4 of 38
Improvements
• Up to 1.5x greater performance per watt than its predecessor
• Up to 1.25x performance per clock compared to previous 14nm processors
• Up to 1.23x higher max frequencies than its predecessor
• Up to 1.23x lower power consumption than its predecessor
Achieved through
• 7nm process
• Higher clocks
• Focused design for lower dynamic power
• Intelligent SOC design with power and performance at the forefront
• Improved power management
• All new AMD RDNA graphics architecture – higher performance for
the same cycles
AMD Radeon™ RX 5700 series
See endnote RX-327, RX-325 and RX-362
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 5 of 38
AMD Radeon™ RX 5700 Series
Overview
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 6 of 38
AMD Radeon™ RX 5700 XT
PCIE®
Gen4 x16 32 GB/s
Display
DP1.4
HDMI 4K 60fps
Multimedia
4K H264
Encode/Decode
H265/HEVC
Encode/Decode
GDDR6
256b
14 Gbps
See endnote GD-81
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 7 of 38
AMD Radeon™ RX 5700 XT Floorplan
Graphics
BUS
Interface
Display
G6 PHYG6 PHY
G6
Control
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 8 of 38
AMD
Radeon RX
5700 XT
AMD
Radeon RX
Vega 64
AMD
Radeon RX
580
Compute Units 40 64 36
Texture Units 160 256 144
ROPs 64 64 32
Memory Clocks 14 Gbps
GDDR6
1.89 Gbps
HBM2
8 Gbps
GDDR5
Memory Bus Width 256 bit 2048 bit 256 bit
Frame Buffer 8 GB 8 GB 8 GB
Boost Clock 1905 Mhz 1546 Mhz 1257 Mhz
Typical Board Power 225W 295W 185W
Transitor Count 10.3B 12.5B 5.7B
Manufacturing Process TSMC 7nm GF 14 nm GF 14 nm
Architecture RDNA GCN GCN
AMD Radeon™ RX 5700 XT Summary
0%
20%
40%
60%
80%
100% Additional Frequency and
Power Improvement
7nm process
Performance per
Clock
Enhancement
0
20
40
60
80
100
120
140
160 Delivered Performance
GCN RDNA
+50%
Same-Power,
Same-Configuration
Performance Gains
Performance Contributors
See endnote GD-151, RX-325
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 9 of 38
AMD Radeon™ RX 5700 Series
RDNA Graphics Architecture
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 10 of 38
40 RDNA Compute Units
• 80 Scalar Processors
• 2560 Stream Processors
• 160 64b Bilinear Filter units
Multilevel Cache
• 4MB L2, 512KB L1, (V$, I$, K$) L0
• 2x V$L0 Load Bandwidth
• DCC Everywhere
Streamlined Graphics Engine
• Geometry Engine (4 Prim shader out, 8 Prim
shader in)
• 64 Pixel Units
• 4 Asynchronous Compute Engines
Designed for higher frequencies at lower power
AMD Radeon™ RX 5700 XT
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 11 of 38
AMD Radeon™ RX 5700 XT Floorplan
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 12 of 38
AMD Radeon™ RX 5700 XT Functional Floorplan
InfinityFabric
PCIE
Gen4
Display
Engine
MultimediaEngine
Geometry
Processor
ShaderEngine
Command
Processor
HWS
DMA
64-bit Memory Controller 64-bit Memory Controller
L1
Prim
Unit
L1
Prim
Unit
L1
Prim
Unit
L1
Prim
Unit
Rasterizer
Rasterizer
ShaderEngine
L2 L2 L2 L2 L2 L2 L2 L2
Compute Units
L2 L2 L2 L2 L2 L2 L2 L2
64-bit Memory Controller 64-bit Memory Controller
RBs
ACE
Compute Units
RBsRBs
RBs
Compute Units
Compute Units
Rasterizer
Rasterizer
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 13 of 38
AMD Radeon™ RX 5700 XT Functional Floorplan
InfinityFabric
PCIE
Gen4
Display
Engine
MultimediaEngine
Geometry
Processor
ShaderEngine
Command
Processor
HWS
DMA
64-bit Memory Controller 64-bit Memory Controller
L1
Prim
Unit
L1
Prim
Unit
L1
Prim
Unit
L1
Prim
Unit
Rasterizer
Rasterizer
ShaderEngine
L2 L2 L2 L2 L2 L2 L2 L2
Compute Units
L2 L2 L2 L2 L2 L2 L2 L2
64-bit Memory Controller 64-bit Memory Controller
RBs
ACE
Compute Units
RBsRBs
RBs
Compute Units
Compute Units
Rasterizer
Rasterizer
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 14 of 38
RDNA – Work Group Processor
Up to 20 wave controllers
Improved instruction arbitration
10KB scalar register file
• 128 32b register per wavefront
128 KB Vector register file
~2X instruction rate vs GCN
• Dual SIMD32
Single cycle issue
• Wave32 on SIMD32
Bytes Per Flop
• 128B Load/Store
• 64B Filter Rate
Scalar
Registers
Redraw the graphic so its not a blatant copy of Hot Chips
RDNA
Workgroup
Processor
Scalar
Units
Vector ALUs (SIMD32)
Shader
Sequencers
Texture
Mapping
Units
Vector
Registers
Texture
L0
Cache
Scalar
Data
Cache
Local
Data
Share
Shader
Instruction
Cache
32 wide single and dual half ALU
• Full rate 32b FMA, Dual 16b FMA
8 wide transcendental ALU
• Single cycle issue
• Multi-cycle co-execution
SIMD Unit WGP
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 15 of 38
Schedulers
Local Data ShareScalar Data Cache
Shader Instruction Cache
Texture Mapping Units
Texture Filter Units
Stream Processors
Vector Registers
Scalar Registers
Scalar Units
Scheduler
Local
Data
Share
Texture
Filter
Units
L1 CacheVector ALU
Texture Fetch
Load/Store
Units
Scalar Registers Scalar
Unit
Vector
Registers Vector Units
Branch & Message
Unit
RDNA
Compute Unit
GCN
Compute Unit
RDNA
WGP
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 16 of 38
Cycle N Lanes 15 - 0
Cycle N+1 Lanes 31 – 16
Cycle N+2 Lanes 47 - 32
Cycle N+3 Lanes 63 - 48
WAVE64-4CYCLEISSUE
Operand gathering
4 cycle issue
VGPRVGPR VGPR VGPR
Operand gathering
4 cycle issue
Operand gathering
4 cycle issue
Operand gathering
4 cycle issue
SIMD0 SIMD 1 SIMD 2 SIMD 3
VGPRVGPR VGPR VGPR VGPRVGPR VGPR VGPRVGPRVGPR VGPR VGPR
S
Cycle0 SIMD0 Waves
Cycle1 SIMD1 Waves
Cycle2 SIMD2 Waves
Cycle3 SIMD3 Waves
SHARED SCALAR
Cycle0 SIMD0 Waves
Cycle1 SIMD1 Waves
Cycle2 SIMD2 Waves
Cycle3 SIMD3 Waves
All work-items of a wave64 have an opportunity to do work once every 4 clocks due to hardware interleaving
Special Function Unit alternate execution unit running at ¼ rate
A wave from a SIMD has an opportunity to accomplish a scalar instruction once every 4 clocks
GCN Instruction Issue
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 17 of 38
SIMD 0 SIMD 1S S
Operand gathering
1 cycle issue
VGPRVGPR VGPR VGPR VGPRVGPR VGPR VGPR
Operand gathering
1 cycle issue
SIMD0 Wave32 – every cycle issue
Vector Instruction Issue any cycle
Or SFU Issue once every 4 cycles
SIMD0 Wave32
Every cycle issue
SIMD1 Wave32 – every cycle issue
Vector Instruction Issue any cycle
Or SFU Issue once every 4 cycles
SIMD1 Wave32
Every cycle issue
Vector Units - All work-items of one wave32 have an opportunity to do work every clock
Special Function Unit uses 1 issue cycle and then executes in parallel
Each SIMD equipped with a scalar unit for an instruction execution every cycle
RDNA Instruction Issue
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 18 of 38
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
s_add_i32 s0, s1, s2
…
…
…
v_mul_f32 v0, v1, s0
… (simd busy 4 cycles)
…
…
v_add_f32 v5, v4, v3
…
…
…
v_sub_f32 v6, v7, v0
…
…
…
s_add_i32 s0, s1, s2
v_mul_f32 v0, v1, s0
v_add_f32 v5, v4, v3
v_sub_f32 v6, v7, v0
s_add_i32 s0, s1, s2
… (salu dependency stall on S0)
v_mul_f32 v0, v1, s0
v_add_f32 v5, v4, v3
… (valu dependency stall on V0)
…
…
v_sub_f32 v6, v7, v0
s_add_i32 s0, s1, s2
… (salu dependency stall on S0)
v_mul_f32 v0, v1, s0 (lo)
v_mul_f32 v0, v1, s0 (hi)
v_add_f32 v5, v4, v3 (lo)
v_add_f32 v5, v4, v3 (hi)
… (valu dependency stall on V0 lo)
v_sub_f32 v6, v7, v0 (lo)
v_sub_f32 v6, v7, v0 (hi)
SHORTEST
WAVE ISSUE
LATENCY
44%
REDUCTION IN
ISSUE CYCLES
RDNA Instruction Issue Example
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 19 of 38
PCIe® 4.0
Async Compute
L2
L1
Texture
Geometry
Rasterizer &
Render Backends
PCIe® 4.0
SOC Fabric
GDDR6
Command
Interfaces
Shader
Complex
RDNA Redesigned Cache Hierarchy
New L1 Cache Hierarchy
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 20 of 38
RDNA Cache Hierarchy
• Doubled the Load Bandwidth
from L0 to ALU
• Improved BW Amplification
• Reduced Latency and Power
• Reduced Congestion at L2 Level
• Reduced Data Movement
4x64B/C 16x32/C
32B/CLK
32B/CLK
128B/CLK
64B/CLK
128B/CLK
64B/CLK
2X
2X
Relative Cache Latency
-24%-21%
-7%
See RX-329 in Endnotes.
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 21 of 38
AMD Radeon™ RX 5700 Series
Power Management
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 22 of 38
AMD Radeon™ RX 5700 Series
Power Management
• Main functions
• Manage clocks/voltages to maximize performance during active workloads
• Draw minimum power during low activity conditions
• Challenges
• Highly variable GPU workloads
• Parallel work leads to high power/current demand
• Ever increasing needs for memory bandwidth
• Power features include
• AVFS to choose the most optimal per-part voltage
• DVFS to choose the best operating point given current environment
• Voltage droop mitigation and impact reduction
• Agile responses to the current draw demands of the moment
• Graceful throttling of the graphics core at thermal limits
• Aggressive clock and power gating
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 23 of 38
Power Management: Fine-grained DPM
Previous Generation: Coarse-grained DPM
• On previous architectures, operation was limited to
a few (typically 8) pre-defined frequency values
• Limited choice for Power Management Controller to
choose from
• Power/Thermal constrained effective frequency
determined by dithering between neighboring
coarse states
AMD RadeonTM RX 5700 Series: Fine-grained DPM
• Much finer-grained DPM state selection across the
V/F curve
• Improved perf/W efficiency by up to 5% as
compared to previous generation by staying on the
optimal curve
• More accurate frequency selection between what
the workload needs and what it gets
DPM7
DPM6
DPM5
DPM4
DPM3
DPM2
DPM1
DPM0
Coarse-
grained
DPM
Fine-
grained
DPM
Fmax
Fidle
Fmin
*Based on AMD internal data.
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 24 of 38
Power Management – Per Part Fmax
Previous generations
• Fmax determined by slowest part of
distribution
• Lower Cac workloads may leave power on
the table for a large population of parts
AMD RadeonTM RX 5700 Series
• Each individual part allowed to achieve
max potential (up to 15% higher) by
selecting its own Vmax-limited Fmax
based on the speed of the part
• Enables applications with lower Cac to
sustain higher clocks rather than be limited
to artificially low limits set by slowest parts
Based on AMD internal data
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 25 of 38
AMD Radeon™ RX 5700 Series
GDDR6 Interface
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 26 of 38
• 14Gbp/s 256b, 448 GB/s
• Up to 75% BW per pin
improvement over GDDR5
• Up to 60%
performance/Watt
GDDR6 Memory
Based on AMD internal data
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 27 of 38
G6 PHY
Read Data Strobe (RDQS) mode
to save power when high
memory bandwidth is not
required
T-coil provides bandwidth
enhancement and improved
return loss enabling the high data
rates on a single-ended interface
• up to 16% in height
• up to 26% in width
40.2
0.300
50.8
0.349
Based on AMD internal data
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 28 of 38
AMD Radeon™ RX 5700 Series
Physical Design
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 29 of 38
AMD Radeon™ RX 5700 Physical Design
Physical Design challenges
• New architecture
• Frequency uplift beyond natural process uplift
• Large design (10.3B transistors) with wide busses and complex crossbar structures
• Decreasing dynamic switching capacitance while providing that frequency uplift
• Operational logistics of managing such a large design
Approach
• High-performance clock distribution
• Intelligent SRAM generation
• Automated place and route while offering hooks for customization
• Exploiting the benefits of the technology while compensating for the new challenges it brings
• Careful use of mixed VTH cells to close timing gaps while maintaining power requirements
• Balancing resource constraints and the desire for physical reuse against area and performance targets
• Power aware floorplanning and bus planning working in conjunction with logic design teams
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 30 of 38
Global Clock Distribution Optimizations
Global clock distribution adopts low skew multiple mesh design style built of highly optimized configurable
clock cells
• Smaller mesh regions and optimized driver design reduces global distribution skew and variability
costs by 30% for most synchronous paths
Clock mesh wire power reduced up to 40% (normalized to area) by optimizing high level metal usage and
reducing parasitic capacitance in clock drivers
MESH1
MESH2
MESH5 MESH7 MESH6
MESH4
MESH3
S
P
I
N
E SPINE
S
P
I
N
E
S
P
I
N
E
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 31 of 38
Local Clock Distribution Optimizations
Configurable mixed-depth structured clock tree adopted for local clock distribution
• Reduces median clock insertion by up to 50% which helps reduce jitter and PVT variability
• Multiple levels of clock gating provides both coarse and fine control
Bottom up expansion of clock tree adopted instead of region-based cloning
• Local clock tree CAC decreased by up to 10% with load-based cloning
S
P
I
N
E
S
P
I
N
E
S
P
I
N
E
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 32 of 38
Bus planning
Bus
widths
• Large busses (upto 2048b) and complex
crossbar structures
• Very large number of physical partitions
need to be managed in an operational
cadence
• 60 unique designs with ~1-2M instance
count, despite considerable reuse.
• Requires prototyping and proving of
achieving performance targets well in
advance of netlist drops
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 33 of 38
AMD Radeon™ RX 5700 Series
Conclusion
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 34 of 38
AMD Radeon™ RX 5700 XT – RDNA
Performance Improvements
Based on internal testing. See endnote RX-363
56CU vs. 40CU
300W TBP vs. 225W TBP
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 35 of 38
Conclusion
AMD RadeonTM RX 5700 Series increased frequency while lowering
active power and improving performance per clock
Enabled by
• Performance and power efficient next-generation AMD RDNA architecture
• Increased memory bandwidth while maintaining power envelope and keeping
costs low
• Advanced power management techniques that allowed residency in the
optimum power states while not limiting performance to the worst of the
population
• Attaining timing closure through reducing skew and jitter, improved bus
planning, judicious use of Vt cells, and innovative floorplanning
See endnote RX-327, RX-325 and RX-362
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 36 of 38
Acknowledgment
We would like to thank our talented AMD design teams
across Austin, Bangalore, Boston, Fort Collins, Hyderabad,
Markham, Santa Clara, and Shanghai.
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 37 of 38
Notes
• GD-81 HEVC (H.265), H.264, and VP9 acceleration are subject to and not operable without inclusion/installation of compatible HEVC players. GD-81
• GD-151: Boost Clock Frequency is the maximum frequency achievable on the GPU running a bursty workload. Boost clock achievability, frequency, and sustainability will vary based on several factors, including
but not limited to: thermal conditions and variation in applications and workloads.
• RX-325: Testing done by AMD performance labs 5/23/19, using the Division 2 @ 25x14 Ultra settings. Performance may vary based on use of latest drivers. RX-325
• RX-327: Testing done by AMD performance labs 5/23/19, showing a geomean of 1.25x per/clock across 30 different games @ 4K Ultra, 4xAA settings. Performance may vary based on use of latest drivers. RX-
327
• RX-329 Testing conducted by AMD Performance Labs as of 05/30/2019 on Radeon RX 5700XT with AMD Driver 19.10 (1902270946) on Intel i7-6900k, and on Radeon Vega Frontier Edition with AMD Driver 19.30 (1904231814) on
Intel i7-5960k. Both systems used 2x8GB DDR4 2133Mhz RAM, Asus ROG Rampage V Edition Motherboard, and Windows 10 Enterprise. Performance may vary. RX-329.
• RX-362: Testing done by AMD performance labs on June 4, 2019. Systems were tested with: Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz (6 core) with 16GB DDR4 @ 2133 MHz using a Asus X99-E Motherboard
running Windows 10 Enterprise 64-bit (Ver. 1809, build 17763.053). Using the following graphics cards: Navi 10 (Driver 19.30_1905161434 (CL# 1784070)) with 40 compute units, versus a Vega 64 (Driver
19.4.1) with 40 compute units enabled. Breakdown based on AMD internal data June 4, 2019. Performance may vary. RX-362
• RX-363 Testing done by AMD performance labs 5/30/2019 on Core i9-9900K (3.6 GHz), 16GB DDR4-3200MHz, GIGABYTE Z390 AORUS ELITE, Win 10 64-bit, AMD Driver 19.30 for RX5700, and 19.10-190502a for Vega 56. Measuring
FPS using: Dirt Rally 2, Sid Meier's Civilization 6, Metro Exodus, Tom Clancy's Ghost Recon Wildlands, Shadow of the Tomb Raider Battlefield 5, Assassin's Creed Odyssey, Call of Duty: Black Ops 4 The Division 2, Far Cry New Dawn. All
at max settings. PC manufacturers may vary configurations yielding different results.. Performance may vary based on use of latest drivers. RX-363
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 38 of 38
Disclaimer and Endnotes
DISCLAIMER
The information contained herein is for informational purposes only and is subject to change without notice. While every precaution has been taken in the
preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise
correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of
this document, and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or fitness for particular purposes, with
respect to the operation or use of AMD hardware, software or other products described herein. No license, including implied or arising by estoppel, to any
intellectual property rights is granted by this document. Terms and limitations applicable to the purchase or use of AMD’s products are as set forth in a signed
agreement between the parties or in AMD's Standard Terms and Conditions of Sale. GD-18
All rights reserved. AMD, the AMD Arrow logo, combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this
publication are for identification purposes only and may be trademarks of their respective companies.

Contenu connexe

Tendances

A Reliable Wafer-Level Chip Scale Package (WLCSP) Technology
A Reliable Wafer-Level Chip Scale Package (WLCSP) TechnologyA Reliable Wafer-Level Chip Scale Package (WLCSP) Technology
A Reliable Wafer-Level Chip Scale Package (WLCSP) Technology
Ozen Engineering, Inc.
 

Tendances (20)

AMD EPYC™ Microprocessor Architecture
AMD EPYC™ Microprocessor ArchitectureAMD EPYC™ Microprocessor Architecture
AMD EPYC™ Microprocessor Architecture
 
The Path to "Zen 2"
The Path to "Zen 2"The Path to "Zen 2"
The Path to "Zen 2"
 
AMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop ProductsAMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop Products
 
AMD: Where Gaming Begins
AMD: Where Gaming BeginsAMD: Where Gaming Begins
AMD: Where Gaming Begins
 
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerModular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
 
AMD Ryzen CPU Zen Cores Architecture
AMD Ryzen CPU Zen Cores ArchitectureAMD Ryzen CPU Zen Cores Architecture
AMD Ryzen CPU Zen Cores Architecture
 
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APUDelivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
 
Delivering the Future of High-Performance Computing
Delivering the Future of High-Performance ComputingDelivering the Future of High-Performance Computing
Delivering the Future of High-Performance Computing
 
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APUHot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
 
Supermicro’s Universal GPU: Modular, Standards Based and Built for the Future
Supermicro’s Universal GPU: Modular, Standards Based and Built for the FutureSupermicro’s Universal GPU: Modular, Standards Based and Built for the Future
Supermicro’s Universal GPU: Modular, Standards Based and Built for the Future
 
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super AffordableSupermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
 
AMD Hot Chips Bulldozer & Bobcat Presentation
AMD Hot Chips Bulldozer & Bobcat PresentationAMD Hot Chips Bulldozer & Bobcat Presentation
AMD Hot Chips Bulldozer & Bobcat Presentation
 
AMD Processor
AMD ProcessorAMD Processor
AMD Processor
 
The ideal and reality of NVDIMM RAS
The ideal and reality of NVDIMM RASThe ideal and reality of NVDIMM RAS
The ideal and reality of NVDIMM RAS
 
NVIDIA DataArt IT
NVIDIA DataArt ITNVIDIA DataArt IT
NVIDIA DataArt IT
 
Linux SD/MMC Driver Stack
Linux SD/MMC Driver Stack Linux SD/MMC Driver Stack
Linux SD/MMC Driver Stack
 
A Reliable Wafer-Level Chip Scale Package (WLCSP) Technology
A Reliable Wafer-Level Chip Scale Package (WLCSP) TechnologyA Reliable Wafer-Level Chip Scale Package (WLCSP) Technology
A Reliable Wafer-Level Chip Scale Package (WLCSP) Technology
 
ELC21: VM-to-VM Communication Mechanisms for Embedded
ELC21: VM-to-VM Communication Mechanisms for EmbeddedELC21: VM-to-VM Communication Mechanisms for Embedded
ELC21: VM-to-VM Communication Mechanisms for Embedded
 
LCA13: Power State Coordination Interface
LCA13: Power State Coordination InterfaceLCA13: Power State Coordination Interface
LCA13: Power State Coordination Interface
 
Building the World's Largest GPU
Building the World's Largest GPUBuilding the World's Largest GPU
Building the World's Largest GPU
 

Similaire à AMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUs

Amd accelerated computing -ufrj
Amd   accelerated computing -ufrjAmd   accelerated computing -ufrj
Amd accelerated computing -ufrj
Roberto Brandao
 
“Flexible Machine Learning Solutions with Lattice FPGAs,” a Presentation from...
“Flexible Machine Learning Solutions with Lattice FPGAs,” a Presentation from...“Flexible Machine Learning Solutions with Lattice FPGAs,” a Presentation from...
“Flexible Machine Learning Solutions with Lattice FPGAs,” a Presentation from...
Edge AI and Vision Alliance
 
“Fast-track Design Cycles Using Lattice’s FPGAs,” a Presentation from Lattice...
“Fast-track Design Cycles Using Lattice’s FPGAs,” a Presentation from Lattice...“Fast-track Design Cycles Using Lattice’s FPGAs,” a Presentation from Lattice...
“Fast-track Design Cycles Using Lattice’s FPGAs,” a Presentation from Lattice...
Edge AI and Vision Alliance
 
AMD Financial Analyst Day
AMD Financial Analyst DayAMD Financial Analyst Day
AMD Financial Analyst Day
AMD
 
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
Edge AI and Vision Alliance
 

Similaire à AMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUs (20)

NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdfNVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
 
Amd epyc update_gdep_xilinx_ai_web_seminar_20201028
Amd epyc update_gdep_xilinx_ai_web_seminar_20201028Amd epyc update_gdep_xilinx_ai_web_seminar_20201028
Amd epyc update_gdep_xilinx_ai_web_seminar_20201028
 
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahGS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
 
Jetson AGX Xavier and the New Era of Autonomous Machines
Jetson AGX Xavier and the New Era of Autonomous MachinesJetson AGX Xavier and the New Era of Autonomous Machines
Jetson AGX Xavier and the New Era of Autonomous Machines
 
Amd accelerated computing -ufrj
Amd   accelerated computing -ufrjAmd   accelerated computing -ufrj
Amd accelerated computing -ufrj
 
SDC Server Sao Jose
SDC Server Sao JoseSDC Server Sao Jose
SDC Server Sao Jose
 
計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?
 
“Flexible Machine Learning Solutions with Lattice FPGAs,” a Presentation from...
“Flexible Machine Learning Solutions with Lattice FPGAs,” a Presentation from...“Flexible Machine Learning Solutions with Lattice FPGAs,” a Presentation from...
“Flexible Machine Learning Solutions with Lattice FPGAs,” a Presentation from...
 
PowerEdge Rack and Tower Server Masters AMD Processors.pptx
PowerEdge Rack and Tower Server Masters AMD Processors.pptxPowerEdge Rack and Tower Server Masters AMD Processors.pptx
PowerEdge Rack and Tower Server Masters AMD Processors.pptx
 
LEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous Hardware
 
NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
NVIDIA GPUs Power HPC & AI Workloads in Cloud with UnivaNVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
 
Latest HPC News from NVIDIA
Latest HPC News from NVIDIALatest HPC News from NVIDIA
Latest HPC News from NVIDIA
 
“Fast-track Design Cycles Using Lattice’s FPGAs,” a Presentation from Lattice...
“Fast-track Design Cycles Using Lattice’s FPGAs,” a Presentation from Lattice...“Fast-track Design Cycles Using Lattice’s FPGAs,” a Presentation from Lattice...
“Fast-track Design Cycles Using Lattice’s FPGAs,” a Presentation from Lattice...
 
AMD Financial Analyst Day
AMD Financial Analyst DayAMD Financial Analyst Day
AMD Financial Analyst Day
 
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mão
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mãoWebinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mão
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mão
 
Esperanto accelerates machine learning with 1000+ low power RISC-V cores on a...
Esperanto accelerates machine learning with 1000+ low power RISC-V cores on a...Esperanto accelerates machine learning with 1000+ low power RISC-V cores on a...
Esperanto accelerates machine learning with 1000+ low power RISC-V cores on a...
 
ArcGIS Server a Brief Synopsis
ArcGIS Server a Brief SynopsisArcGIS Server a Brief Synopsis
ArcGIS Server a Brief Synopsis
 
High Performance DSP with Xilinx All Programmable Devices (Design Conference ...
High Performance DSP with Xilinx All Programmable Devices (Design Conference ...High Performance DSP with Xilinx All Programmable Devices (Design Conference ...
High Performance DSP with Xilinx All Programmable Devices (Design Conference ...
 
NVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and TransparencyNVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and Transparency
 
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
 

Plus de AMD

Plus de AMD (15)

AMD EPYC Family World Record Performance Summary Mar 2022
AMD EPYC Family World Record Performance Summary Mar 2022AMD EPYC Family World Record Performance Summary Mar 2022
AMD EPYC Family World Record Performance Summary Mar 2022
 
AMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World RecordAMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World Record
 
AMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World RecordAMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World Record
 
AMD EPYC World Records
AMD EPYC World RecordsAMD EPYC World Records
AMD EPYC World Records
 
AMD EPYC 7002 World Records
AMD EPYC 7002 World RecordsAMD EPYC 7002 World Records
AMD EPYC 7002 World Records
 
AMD EPYC 7002 World Records
AMD EPYC 7002 World RecordsAMD EPYC 7002 World Records
AMD EPYC 7002 World Records
 
AMD EPYC 100 World Records and Counting
AMD EPYC 100 World Records and CountingAMD EPYC 100 World Records and Counting
AMD EPYC 100 World Records and Counting
 
AMD EPYC 7002 Launch World Records
AMD EPYC 7002 Launch World RecordsAMD EPYC 7002 Launch World Records
AMD EPYC 7002 Launch World Records
 
AMD Next Horizon
AMD Next HorizonAMD Next Horizon
AMD Next Horizon
 
AMD Next Horizon
AMD Next HorizonAMD Next Horizon
AMD Next Horizon
 
AMD Next Horizon
AMD Next HorizonAMD Next Horizon
AMD Next Horizon
 
Race to Reality: The Next Billion-People Market Opportunity
Race to Reality: The Next Billion-People Market OpportunityRace to Reality: The Next Billion-People Market Opportunity
Race to Reality: The Next Billion-People Market Opportunity
 
GPU Compute in Medical and Print Imaging
GPU Compute in Medical and Print ImagingGPU Compute in Medical and Print Imaging
GPU Compute in Medical and Print Imaging
 
Enabling ARM® Server Technology for the Datacenter
Enabling ARM® Server Technology for the DatacenterEnabling ARM® Server Technology for the Datacenter
Enabling ARM® Server Technology for the Datacenter
 
Lessons From MineCraft: Building the Right SMB Network
Lessons From MineCraft: Building the Right SMB NetworkLessons From MineCraft: Building the Right SMB Network
Lessons From MineCraft: Building the Right SMB Network
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

AMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUs

  • 1. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 1 of 38 AMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUs Sal Dasgupta1,Teja Singh2, Ashish Jain2, Samuel Naffziger3, Deepesh John2, Chetan Bisht4, Pradeep Jayaraman1, Michael Mantor4 1AMD, Santa Clara, CA, 2AMD, Austin, TX, 3AMD, Fort Collins, CO, 4AMD, Orlando, FL Presented at ISSCC 2020
  • 2. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 2 of 38 Outline • Overview of AMD Radeon™ RX 5700 Series • AMD RDNA Architecture • Power Management features • GDDR6 (G6) PHY • Physical design
  • 3. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 3 of 38 AMD Radeon™ RX 5000 series • GPUs are everywhere • GPUs need to service a wide range of form factors and workloads • Fundamental challenge is to get higher and higher performance at lower and lower power PC Gaming Content Creation Console Gaming Cloud Gaming Mobile Devices
  • 4. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 4 of 38 Improvements • Up to 1.5x greater performance per watt than its predecessor • Up to 1.25x performance per clock compared to previous 14nm processors • Up to 1.23x higher max frequencies than its predecessor • Up to 1.23x lower power consumption than its predecessor Achieved through • 7nm process • Higher clocks • Focused design for lower dynamic power • Intelligent SOC design with power and performance at the forefront • Improved power management • All new AMD RDNA graphics architecture – higher performance for the same cycles AMD Radeon™ RX 5700 series See endnote RX-327, RX-325 and RX-362
  • 5. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 5 of 38 AMD Radeon™ RX 5700 Series Overview
  • 6. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 6 of 38 AMD Radeon™ RX 5700 XT PCIE® Gen4 x16 32 GB/s Display DP1.4 HDMI 4K 60fps Multimedia 4K H264 Encode/Decode H265/HEVC Encode/Decode GDDR6 256b 14 Gbps See endnote GD-81
  • 7. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 7 of 38 AMD Radeon™ RX 5700 XT Floorplan Graphics BUS Interface Display G6 PHYG6 PHY G6 Control
  • 8. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 8 of 38 AMD Radeon RX 5700 XT AMD Radeon RX Vega 64 AMD Radeon RX 580 Compute Units 40 64 36 Texture Units 160 256 144 ROPs 64 64 32 Memory Clocks 14 Gbps GDDR6 1.89 Gbps HBM2 8 Gbps GDDR5 Memory Bus Width 256 bit 2048 bit 256 bit Frame Buffer 8 GB 8 GB 8 GB Boost Clock 1905 Mhz 1546 Mhz 1257 Mhz Typical Board Power 225W 295W 185W Transitor Count 10.3B 12.5B 5.7B Manufacturing Process TSMC 7nm GF 14 nm GF 14 nm Architecture RDNA GCN GCN AMD Radeon™ RX 5700 XT Summary 0% 20% 40% 60% 80% 100% Additional Frequency and Power Improvement 7nm process Performance per Clock Enhancement 0 20 40 60 80 100 120 140 160 Delivered Performance GCN RDNA +50% Same-Power, Same-Configuration Performance Gains Performance Contributors See endnote GD-151, RX-325
  • 9. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 9 of 38 AMD Radeon™ RX 5700 Series RDNA Graphics Architecture
  • 10. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 10 of 38 40 RDNA Compute Units • 80 Scalar Processors • 2560 Stream Processors • 160 64b Bilinear Filter units Multilevel Cache • 4MB L2, 512KB L1, (V$, I$, K$) L0 • 2x V$L0 Load Bandwidth • DCC Everywhere Streamlined Graphics Engine • Geometry Engine (4 Prim shader out, 8 Prim shader in) • 64 Pixel Units • 4 Asynchronous Compute Engines Designed for higher frequencies at lower power AMD Radeon™ RX 5700 XT
  • 11. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 11 of 38 AMD Radeon™ RX 5700 XT Floorplan
  • 12. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 12 of 38 AMD Radeon™ RX 5700 XT Functional Floorplan InfinityFabric PCIE Gen4 Display Engine MultimediaEngine Geometry Processor ShaderEngine Command Processor HWS DMA 64-bit Memory Controller 64-bit Memory Controller L1 Prim Unit L1 Prim Unit L1 Prim Unit L1 Prim Unit Rasterizer Rasterizer ShaderEngine L2 L2 L2 L2 L2 L2 L2 L2 Compute Units L2 L2 L2 L2 L2 L2 L2 L2 64-bit Memory Controller 64-bit Memory Controller RBs ACE Compute Units RBsRBs RBs Compute Units Compute Units Rasterizer Rasterizer
  • 13. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 13 of 38 AMD Radeon™ RX 5700 XT Functional Floorplan InfinityFabric PCIE Gen4 Display Engine MultimediaEngine Geometry Processor ShaderEngine Command Processor HWS DMA 64-bit Memory Controller 64-bit Memory Controller L1 Prim Unit L1 Prim Unit L1 Prim Unit L1 Prim Unit Rasterizer Rasterizer ShaderEngine L2 L2 L2 L2 L2 L2 L2 L2 Compute Units L2 L2 L2 L2 L2 L2 L2 L2 64-bit Memory Controller 64-bit Memory Controller RBs ACE Compute Units RBsRBs RBs Compute Units Compute Units Rasterizer Rasterizer
  • 14. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 14 of 38 RDNA – Work Group Processor Up to 20 wave controllers Improved instruction arbitration 10KB scalar register file • 128 32b register per wavefront 128 KB Vector register file ~2X instruction rate vs GCN • Dual SIMD32 Single cycle issue • Wave32 on SIMD32 Bytes Per Flop • 128B Load/Store • 64B Filter Rate Scalar Registers Redraw the graphic so its not a blatant copy of Hot Chips RDNA Workgroup Processor Scalar Units Vector ALUs (SIMD32) Shader Sequencers Texture Mapping Units Vector Registers Texture L0 Cache Scalar Data Cache Local Data Share Shader Instruction Cache 32 wide single and dual half ALU • Full rate 32b FMA, Dual 16b FMA 8 wide transcendental ALU • Single cycle issue • Multi-cycle co-execution SIMD Unit WGP
  • 15. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 15 of 38 Schedulers Local Data ShareScalar Data Cache Shader Instruction Cache Texture Mapping Units Texture Filter Units Stream Processors Vector Registers Scalar Registers Scalar Units Scheduler Local Data Share Texture Filter Units L1 CacheVector ALU Texture Fetch Load/Store Units Scalar Registers Scalar Unit Vector Registers Vector Units Branch & Message Unit RDNA Compute Unit GCN Compute Unit RDNA WGP
  • 16. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 16 of 38 Cycle N Lanes 15 - 0 Cycle N+1 Lanes 31 – 16 Cycle N+2 Lanes 47 - 32 Cycle N+3 Lanes 63 - 48 WAVE64-4CYCLEISSUE Operand gathering 4 cycle issue VGPRVGPR VGPR VGPR Operand gathering 4 cycle issue Operand gathering 4 cycle issue Operand gathering 4 cycle issue SIMD0 SIMD 1 SIMD 2 SIMD 3 VGPRVGPR VGPR VGPR VGPRVGPR VGPR VGPRVGPRVGPR VGPR VGPR S Cycle0 SIMD0 Waves Cycle1 SIMD1 Waves Cycle2 SIMD2 Waves Cycle3 SIMD3 Waves SHARED SCALAR Cycle0 SIMD0 Waves Cycle1 SIMD1 Waves Cycle2 SIMD2 Waves Cycle3 SIMD3 Waves All work-items of a wave64 have an opportunity to do work once every 4 clocks due to hardware interleaving Special Function Unit alternate execution unit running at ¼ rate A wave from a SIMD has an opportunity to accomplish a scalar instruction once every 4 clocks GCN Instruction Issue
  • 17. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 17 of 38 SIMD 0 SIMD 1S S Operand gathering 1 cycle issue VGPRVGPR VGPR VGPR VGPRVGPR VGPR VGPR Operand gathering 1 cycle issue SIMD0 Wave32 – every cycle issue Vector Instruction Issue any cycle Or SFU Issue once every 4 cycles SIMD0 Wave32 Every cycle issue SIMD1 Wave32 – every cycle issue Vector Instruction Issue any cycle Or SFU Issue once every 4 cycles SIMD1 Wave32 Every cycle issue Vector Units - All work-items of one wave32 have an opportunity to do work every clock Special Function Unit uses 1 issue cycle and then executes in parallel Each SIMD equipped with a scalar unit for an instruction execution every cycle RDNA Instruction Issue
  • 18. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 18 of 38 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 s_add_i32 s0, s1, s2 … … … v_mul_f32 v0, v1, s0 … (simd busy 4 cycles) … … v_add_f32 v5, v4, v3 … … … v_sub_f32 v6, v7, v0 … … … s_add_i32 s0, s1, s2 v_mul_f32 v0, v1, s0 v_add_f32 v5, v4, v3 v_sub_f32 v6, v7, v0 s_add_i32 s0, s1, s2 … (salu dependency stall on S0) v_mul_f32 v0, v1, s0 v_add_f32 v5, v4, v3 … (valu dependency stall on V0) … … v_sub_f32 v6, v7, v0 s_add_i32 s0, s1, s2 … (salu dependency stall on S0) v_mul_f32 v0, v1, s0 (lo) v_mul_f32 v0, v1, s0 (hi) v_add_f32 v5, v4, v3 (lo) v_add_f32 v5, v4, v3 (hi) … (valu dependency stall on V0 lo) v_sub_f32 v6, v7, v0 (lo) v_sub_f32 v6, v7, v0 (hi) SHORTEST WAVE ISSUE LATENCY 44% REDUCTION IN ISSUE CYCLES RDNA Instruction Issue Example
  • 19. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 19 of 38 PCIe® 4.0 Async Compute L2 L1 Texture Geometry Rasterizer & Render Backends PCIe® 4.0 SOC Fabric GDDR6 Command Interfaces Shader Complex RDNA Redesigned Cache Hierarchy New L1 Cache Hierarchy
  • 20. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 20 of 38 RDNA Cache Hierarchy • Doubled the Load Bandwidth from L0 to ALU • Improved BW Amplification • Reduced Latency and Power • Reduced Congestion at L2 Level • Reduced Data Movement 4x64B/C 16x32/C 32B/CLK 32B/CLK 128B/CLK 64B/CLK 128B/CLK 64B/CLK 2X 2X Relative Cache Latency -24%-21% -7% See RX-329 in Endnotes.
  • 21. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 21 of 38 AMD Radeon™ RX 5700 Series Power Management
  • 22. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 22 of 38 AMD Radeon™ RX 5700 Series Power Management • Main functions • Manage clocks/voltages to maximize performance during active workloads • Draw minimum power during low activity conditions • Challenges • Highly variable GPU workloads • Parallel work leads to high power/current demand • Ever increasing needs for memory bandwidth • Power features include • AVFS to choose the most optimal per-part voltage • DVFS to choose the best operating point given current environment • Voltage droop mitigation and impact reduction • Agile responses to the current draw demands of the moment • Graceful throttling of the graphics core at thermal limits • Aggressive clock and power gating
  • 23. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 23 of 38 Power Management: Fine-grained DPM Previous Generation: Coarse-grained DPM • On previous architectures, operation was limited to a few (typically 8) pre-defined frequency values • Limited choice for Power Management Controller to choose from • Power/Thermal constrained effective frequency determined by dithering between neighboring coarse states AMD RadeonTM RX 5700 Series: Fine-grained DPM • Much finer-grained DPM state selection across the V/F curve • Improved perf/W efficiency by up to 5% as compared to previous generation by staying on the optimal curve • More accurate frequency selection between what the workload needs and what it gets DPM7 DPM6 DPM5 DPM4 DPM3 DPM2 DPM1 DPM0 Coarse- grained DPM Fine- grained DPM Fmax Fidle Fmin *Based on AMD internal data.
  • 24. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 24 of 38 Power Management – Per Part Fmax Previous generations • Fmax determined by slowest part of distribution • Lower Cac workloads may leave power on the table for a large population of parts AMD RadeonTM RX 5700 Series • Each individual part allowed to achieve max potential (up to 15% higher) by selecting its own Vmax-limited Fmax based on the speed of the part • Enables applications with lower Cac to sustain higher clocks rather than be limited to artificially low limits set by slowest parts Based on AMD internal data
  • 25. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 25 of 38 AMD Radeon™ RX 5700 Series GDDR6 Interface
  • 26. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 26 of 38 • 14Gbp/s 256b, 448 GB/s • Up to 75% BW per pin improvement over GDDR5 • Up to 60% performance/Watt GDDR6 Memory Based on AMD internal data
  • 27. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 27 of 38 G6 PHY Read Data Strobe (RDQS) mode to save power when high memory bandwidth is not required T-coil provides bandwidth enhancement and improved return loss enabling the high data rates on a single-ended interface • up to 16% in height • up to 26% in width 40.2 0.300 50.8 0.349 Based on AMD internal data
  • 28. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 28 of 38 AMD Radeon™ RX 5700 Series Physical Design
  • 29. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 29 of 38 AMD Radeon™ RX 5700 Physical Design Physical Design challenges • New architecture • Frequency uplift beyond natural process uplift • Large design (10.3B transistors) with wide busses and complex crossbar structures • Decreasing dynamic switching capacitance while providing that frequency uplift • Operational logistics of managing such a large design Approach • High-performance clock distribution • Intelligent SRAM generation • Automated place and route while offering hooks for customization • Exploiting the benefits of the technology while compensating for the new challenges it brings • Careful use of mixed VTH cells to close timing gaps while maintaining power requirements • Balancing resource constraints and the desire for physical reuse against area and performance targets • Power aware floorplanning and bus planning working in conjunction with logic design teams
  • 30. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 30 of 38 Global Clock Distribution Optimizations Global clock distribution adopts low skew multiple mesh design style built of highly optimized configurable clock cells • Smaller mesh regions and optimized driver design reduces global distribution skew and variability costs by 30% for most synchronous paths Clock mesh wire power reduced up to 40% (normalized to area) by optimizing high level metal usage and reducing parasitic capacitance in clock drivers MESH1 MESH2 MESH5 MESH7 MESH6 MESH4 MESH3 S P I N E SPINE S P I N E S P I N E
  • 31. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 31 of 38 Local Clock Distribution Optimizations Configurable mixed-depth structured clock tree adopted for local clock distribution • Reduces median clock insertion by up to 50% which helps reduce jitter and PVT variability • Multiple levels of clock gating provides both coarse and fine control Bottom up expansion of clock tree adopted instead of region-based cloning • Local clock tree CAC decreased by up to 10% with load-based cloning S P I N E S P I N E S P I N E
  • 32. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 32 of 38 Bus planning Bus widths • Large busses (upto 2048b) and complex crossbar structures • Very large number of physical partitions need to be managed in an operational cadence • 60 unique designs with ~1-2M instance count, despite considerable reuse. • Requires prototyping and proving of achieving performance targets well in advance of netlist drops
  • 33. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 33 of 38 AMD Radeon™ RX 5700 Series Conclusion
  • 34. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 34 of 38 AMD Radeon™ RX 5700 XT – RDNA Performance Improvements Based on internal testing. See endnote RX-363 56CU vs. 40CU 300W TBP vs. 225W TBP
  • 35. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 35 of 38 Conclusion AMD RadeonTM RX 5700 Series increased frequency while lowering active power and improving performance per clock Enabled by • Performance and power efficient next-generation AMD RDNA architecture • Increased memory bandwidth while maintaining power envelope and keeping costs low • Advanced power management techniques that allowed residency in the optimum power states while not limiting performance to the worst of the population • Attaining timing closure through reducing skew and jitter, improved bus planning, judicious use of Vt cells, and innovative floorplanning See endnote RX-327, RX-325 and RX-362
  • 36. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 36 of 38 Acknowledgment We would like to thank our talented AMD design teams across Austin, Bangalore, Boston, Fort Collins, Hyderabad, Markham, Santa Clara, and Shanghai.
  • 37. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 37 of 38 Notes • GD-81 HEVC (H.265), H.264, and VP9 acceleration are subject to and not operable without inclusion/installation of compatible HEVC players. GD-81 • GD-151: Boost Clock Frequency is the maximum frequency achievable on the GPU running a bursty workload. Boost clock achievability, frequency, and sustainability will vary based on several factors, including but not limited to: thermal conditions and variation in applications and workloads. • RX-325: Testing done by AMD performance labs 5/23/19, using the Division 2 @ 25x14 Ultra settings. Performance may vary based on use of latest drivers. RX-325 • RX-327: Testing done by AMD performance labs 5/23/19, showing a geomean of 1.25x per/clock across 30 different games @ 4K Ultra, 4xAA settings. Performance may vary based on use of latest drivers. RX- 327 • RX-329 Testing conducted by AMD Performance Labs as of 05/30/2019 on Radeon RX 5700XT with AMD Driver 19.10 (1902270946) on Intel i7-6900k, and on Radeon Vega Frontier Edition with AMD Driver 19.30 (1904231814) on Intel i7-5960k. Both systems used 2x8GB DDR4 2133Mhz RAM, Asus ROG Rampage V Edition Motherboard, and Windows 10 Enterprise. Performance may vary. RX-329. • RX-362: Testing done by AMD performance labs on June 4, 2019. Systems were tested with: Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz (6 core) with 16GB DDR4 @ 2133 MHz using a Asus X99-E Motherboard running Windows 10 Enterprise 64-bit (Ver. 1809, build 17763.053). Using the following graphics cards: Navi 10 (Driver 19.30_1905161434 (CL# 1784070)) with 40 compute units, versus a Vega 64 (Driver 19.4.1) with 40 compute units enabled. Breakdown based on AMD internal data June 4, 2019. Performance may vary. RX-362 • RX-363 Testing done by AMD performance labs 5/30/2019 on Core i9-9900K (3.6 GHz), 16GB DDR4-3200MHz, GIGABYTE Z390 AORUS ELITE, Win 10 64-bit, AMD Driver 19.30 for RX5700, and 19.10-190502a for Vega 56. Measuring FPS using: Dirt Rally 2, Sid Meier's Civilization 6, Metro Exodus, Tom Clancy's Ghost Recon Wildlands, Shadow of the Tomb Raider Battlefield 5, Assassin's Creed Odyssey, Call of Duty: Black Ops 4 The Division 2, Far Cry New Dawn. All at max settings. PC manufacturers may vary configurations yielding different results.. Performance may vary based on use of latest drivers. RX-363
  • 38. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 38 of 38 Disclaimer and Endnotes DISCLAIMER The information contained herein is for informational purposes only and is subject to change without notice. While every precaution has been taken in the preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of this document, and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or fitness for particular purposes, with respect to the operation or use of AMD hardware, software or other products described herein. No license, including implied or arising by estoppel, to any intellectual property rights is granted by this document. Terms and limitations applicable to the purchase or use of AMD’s products are as set forth in a signed agreement between the parties or in AMD's Standard Terms and Conditions of Sale. GD-18 All rights reserved. AMD, the AMD Arrow logo, combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.