SlideShare une entreprise Scribd logo
1  sur  19
Télécharger pour lire hors ligne
1 | HOT CHIPS 28 | AUGUST 23, 2016
MIKE CLARK
SENIOR FELLOW
A NEW X86 CORE
ARCHITECTURE FOR THE NEXT
GENERATION OF COMPUTING
2 | HOT CHIPS 28 | AUGUST 23, 2016
AGENDA
THE ROAD TO ZEN
HIGH LEVEL ARCHITECTURE
‐ IMPROVEMENTS IN CORE ENGINE
‐ FLOATING POINT
‐ IMPROVEMENTS IN CACHE SYSTEM
‐ SMT DESIGN TO MAXIMIZE THROUGHPUT
‐ NEW ISA EXTENSIONS
SUMMARY
NEXT STEP UP
3 | HOT CHIPS 28 | AUGUST 23, 2016
AMD X86 CORES: DRIVING COMPETITIVE PERFORMANCE
*Based on internal AMD estimates for “Zen” x86 CPU core compared to “Excavator” x86 CPU core.
INSTRUCTIONSPERCLOCK
More Instructions
Per Clock*
40%“Excavator”
Core
“Bulldozer”
Core
“ZEN”
4 | HOT CHIPS 28 | AUGUST 23, 2016
AMD CPU DESIGN OPTIMIZATION POINTS
ONE CORE FROM FANLESS NOTEBOOKS
TO SUPERCOMPUTERS
“ZEN”
PERFORMANCE
LOWER POWER
SMALLER MORE AREA, POWER
MORE PERFORMANCE
Low Power
“Jaguar”
High Performance
“Excavator”
*Based on internal AMD estimates for “Zen” x86 CPU core compared to “Excavator” x86 CPU core.
5 | HOT CHIPS 28 | AUGUST 23, 2016
DEFYING CONVENTION: A WIDE, HIGH PERFORMANCE, EFFICIENT CORE
At = Energy
Per Cycle
+40%
work per
cycle*
Total
Efficiency
Gain
“ZEN”
*Based on internal AMD estimates for “Zen” x86 CPU core compared to “Excavator” x86 CPU core.
Instructions-Per-Clock
Energy Per Cycle
“Steamroller”
“Piledriver”
“Bulldozer”
“Excavator”
6 | HOT CHIPS 28 | AUGUST 23, 2016
ZEN PERFORMANCE & POWER IMPROVEMENTS
LOWER POWER
‐ Aggressive clock gating with
multi-level regions
‐ Write back L1 cache
‐ Large Op Cache
‐ Stack Engine
‐ Move elimination
‐ Power focus from project inception
‐ Low Power Design Methodologies
BETTER CACHE SYSTEM
‐ Write back L1 cache
‐ Faster L2 cache
‐ Faster L3 cache
‐ Faster Load to FPU: 7 vs. 9 cycles
‐ Better L1 and L2 data prefetcher
‐ Close to 2x the L1 and L2 bandwidth
‐ Total L3 bandwidth up 5x
BETTER CORE ENGINE
‐ Two threads per core
‐ Branch mispredict improved
‐ Better branch prediction with 2
branches per BTB entry
‐ Large Op Cache
‐ Wider micro-op dispatch 6 vs. 4
‐ Larger Instruction Schedulers
Integer: 84 vs. 48 | FP: 96 vs. 60
‐ Larger retire 8 ops vs. 4 ops
‐ Quad issue FPU
‐ Larger Retire Queue 192 vs. 128
‐ Larger Load Queue 72 vs. 44
‐ Larger Store Queue 44 vs. 32
40% IPC PERFORMANCE UPLIFT
7 | HOT CHIPS 28 | AUGUST 23, 2016
Decode
4 instructions/cycle
512K
L2 (I+D) Cache
8 Way
ADD MUL ADDMULALU
2 loads + 1 store
per cycle
6 ops dispatched
Op Cache
INTEGER FLOATING POINT
ALU ALU ALU
Micro-op Queue
64K I-Cache
4 way
Branch Prediction
AGUAGU
Load/Store
Queues
Integer Physical Register File
32K D-Cache
8 Way
FP Register File
Integer Rename Floating Point Rename
Scheduler Scheduler Scheduler Scheduler SchedulerScheduler Scheduler
 Fetch Four x86 instructions
 Op Cache instructions
 4 Integer units
‒ Large rename space – 168 Registers
‒ 192 instructions in flight/8 wide retire
 2 Load/Store units
‒ 72 Out-of-Order Loads supported
 2 Floating Point units x 128 FMACs
‒ built as 4 pipes, 2 Fadd, 2 Fmul
 I-Cache 64K, 4-way
 D-Cache 32K, 8-way
 L2 Cache 512K, 8-way
 Large shared L3 cache
 2 threads per core
ZEN MICROARCHITECTURE
Micro-ops
8 | HOT CHIPS 28 | AUGUST 23, 2016
 Decoupled Branch Prediction
 TLB in the BP pipe
‒ 8 entry L0 TLB, all page sizes
‒ 64 entry L1 TLB, all page sizes
‒ 512 entry L2 TLB, no 1G pages
 2 branches per BTB entry
 Large L1 / L2 BTB
 32 entry return stack
 Indirect Target Array (ITA)
 64K, 4-way Instruction cache
 Micro-tags for IC & Op cache
 32 byte fetch
FETCHNext PC
64K Instruction Cache
To Op
Cache
32 bytes to Decode
32 bytes/
cycle
from L2
Redirect
from DE/EX
Physical Request Queue Micro-Tags
L1/L2 BTB
Return Stack ITA
Hash Perceptron
L0/L1/L2 TLB
9 | HOT CHIPS 28 | AUGUST 23, 2016
 Inline Instruction-length Decoder
 Decode 4 x86 instructions
 Op cache
 Micro-op Queue
 Stack Engine
 Branch Fusion
 Memory File for Store to Load Forwarding
DECODE
Instruction Byte Buffer
Pick
Dispatch
From IC From Micro Tags
Instructions
To EX, 6 Micro-ops
To FP,
4 Micro-ops
Microcode Rom Stack Engine Memfile
Micro-op Queue
Decode
Op Cache
Micro-ops
10 | HOT CHIPS 28 | AUGUST 23, 2016
 6x14 entry Scheduling Queues
 168 entry Physical Register File
 6 issue per cycle
‒ 4 ALU’s, 2 AGU’s
 192 entry Retire Queue
 Differential Checkpoints
 2 Branches per cycle
 Move Elimination
 8-Wide Retire
EXECUTE
168 Entry Physical Register File
Forwarding Muxes
Map Retire Queue
6 Micro-op Dispatch To DERedirect
to Fetch
AGU1AGU0ALU1 ALU3ALU2ALU0
AGQ1AGQ0ALQ1 ALQ3ALQ2ALQ0
LS
11 | HOT CHIPS 28 | AUGUST 23, 2016
 72 Out of Order Loads
 44 entry Store Queue
 Split TLB/Data Pipe, store pipe
 64 entry L1 TLB, all page sizes
 1.5K entry L2 TLB, no 1G pages
 32K, 8 way Data Cache
‒ Supports two 128-bit accesses
 Optimized L1 and L2 Prefetchers
 512K, private (2 threads), inclusive L2
LOAD/STORE AND L2
Load Queue
TLB0 TLB1
32K Data Cache
32 bytes to/from L2
AGU0 AGU1 To Ex
To FP
To L2
MAB
Store Pipe Pick
L1 PickL0 Pick
Pre Fetch
Store Queue
STP
L1/L2 TLB +
DC tags
DAT0 DAT1 Store Commit
WCB
12 | HOT CHIPS 28 | AUGUST 23, 2016
 2 Level Scheduling Queue
 160 entry Physical Register File
 8 Wide Retire
 1 pipe for 1x128b store
 Accelerated Recovery on Flushes
 SSE, AVX1, AVX2, AES, SHA, and legacy
mmx/x87 compliant
 2 AES units
FLOATING POINT
160 Entry Physical Register File
Forwarding Muxes
NSQ
192 Entry
Retire Queue
4 Micro-op Dispatch
8 Micro-op
Retire
ADD0 ADD1MUL1MUL0
SQ LDCVT
128 bit
Loads
Int to FP
FP to Int
13 | HOT CHIPS 28 | AUGUST 23, 2016
 Fast private 512K L2 cache
 Fast shared L3 cache
 High bandwidth enables prefetch
improvements
 L3 is filled from L2 victims
 Fast cache-to-cache transfers
 Large Queues for Handling L1 and L2
misses
ZEN CACHE HIERARCHY
32B/cycle
32B fetch 32B/cycle
CORE 0
32B/cycle
32B/cycle
2*16B load
1*16B store
8M L3
I+D
Cache
16-way
512K L2
I+D
Cache
8-way
32K
D-Cache
8-way
64K
I-Cache
4-way
14 | HOT CHIPS 28 | AUGUST 23, 2016
 A CPU complex (CCX) is four
cores connected to an
L3 Cache.
 The L3 Cache is 16-way
associative, 8MB, mostly
exclusive of L2.
 The L3 Cache is made of 4
slices, by low-order address
interleave.
 Every core can access every
cache with same average
latency
CPU COMPLEX
CORE 3
CORE 1L3M
1MB
L
3
C
T
L
L
2
C
T
L
L2M
512K
L3M
1MB
CORE 3L3M
1MB
L
3
C
T
L
L
2
C
T
L
L2M
512K
L3M
1MB
CORE 0 L3M
1MB
L
3
C
T
L
L
2
C
T
L
L2M
512K
L3M
1MB
CORE 2 L3M
1MB
L
3
C
T
L
L
2
C
T
L
L2M
512K
L3M
1MB
15 | HOT CHIPS 28 | AUGUST 23, 2016
INTEGER FLOATING
POINT
Integer Physical Register File FP Register File
Decode
512K
L2 (I+D) Cache
8 Way
ADD MUL ADDMUL4x ALUs
Micro-op Cache
Micro-op Queue
64K I-Cache 4 way Branch Prediction
Integer Rename Floating Point Rename
Schedulers Scheduler
L0/L1/L2
ITLB
Retire Queue
2x AGUs
Load Queue
Store Queue 32K D-Cache
8 Way
L1/L2
DTLBLoad Queue
Store Queue 512K
L2 (I+D) Cache
8 Way
MUL ADD
32K D-Cache
8 Way
L1/L2
DTLB
Integer Physical Register File
Integer Rename
Schedulers
FP Register File
Floating Point Rename
Scheduler
64K I-Cache 4 way
Decode
instructions
ADDMUL4x ALUs
Vertically Threaded
Op-Cache
micro-ops
Micro-op Queue
Branch Prediction
 All structures fully available in 1T mode
 Front End Queues are round robin with
priority overrides
 Increased throughput from SMT
SMT OVERVIEW
L0/L1/L2
ITLB
Competitively shared structures
Statically Partitioned
Competitively shared with Algorithmic Priority
Competitively shared and SMT Tagged
Retire Queue
6 ops dispatched
2x AGUs
16 | HOT CHIPS 28 | AUGUST 23, 2016
NEW INSTRUCTIONS
Feature Notes Excavator Zen
ADX Extending multi-precision arithmetic support 
RDSEED Complement to RDRAND random number generation 
SMAP Supervisor Mode Access Prevention 
SHA1/SHA256 Secure Hash Implementation Instructions 
CLFLUSHOPT CLFLUSH ordered by SFENCE 
XSAVEC/XSAVES/XRSTORS New Compact and Supervisor Save/Restore 
CLZERO Clear Cache Line 
PTE Coalescing Combines 4K page tables into 32K page size 
We support all the standard ISA including AVX &AVX-2, BMI1 & BMI2, AES, RDRAND, SMEP
AMD Exclusive
17 | HOT CHIPS 28 | AUGUST 23, 2016
“ZEN”
Totally New
High-performance
Core Design
DESIGNED FROM THE GROUND
UP FOR OPTIMAL BALANCE OF
PERFORMANCE AND POWER
Simultaneous
Multithreading (SMT)
for High Throughput
Energy-efficient FinFET
Design Scales from
Enterprise to Client
Products
New High-Bandwidth,
Low Latency Cache
System
18 | HOT CHIPS 28 | AUGUST 23, 2016
A COMMITTED ROADMAP TO X86 PERFORMANCE
*Based on internal AMD estimates for “Zen” x86 CPU core compared to “Excavator” x86 CPU core.
INSTRUCTIONSPERCLOCK
More Instructions
Per Clock*
40%“Excavator”
Core
“Bulldozer”
Core
“ZEN”
“ZEN+”
19 | HOT CHIPS 28 | AUGUST 23, 2016
DISCLAIMER & ATTRIBUTION
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.
The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and
motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like.
AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time
to the content hereof without obligation of AMD to notify any person of such revisions or changes.
AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS
THAT MAY APPEAR IN THIS INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR
ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.
ATTRIBUTION
© 2016 Advanced Micro Devices, Inc. All rights reserved. AMD, Radeon, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United
States and/or other jurisdictions. Other names are for informational purposes only and may be trademarks of their respective owners.

Contenu connexe

Tendances

AMD Ryzen CPU Zen Cores Architecture
AMD Ryzen CPU Zen Cores ArchitectureAMD Ryzen CPU Zen Cores Architecture
AMD Ryzen CPU Zen Cores ArchitectureLow Hong Chuan
 
AMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUs
AMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUsAMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUs
AMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUsAMD
 
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APUDelivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APUAMD
 
Heterogeneous Integration with 3D Packaging
Heterogeneous Integration with 3D PackagingHeterogeneous Integration with 3D Packaging
Heterogeneous Integration with 3D PackagingAMD
 
7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance 7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance AMD
 
Delivering the Future of High-Performance Computing
Delivering the Future of High-Performance ComputingDelivering the Future of High-Performance Computing
Delivering the Future of High-Performance ComputingAMD
 
AMD Hot Chips Bulldozer & Bobcat Presentation
AMD Hot Chips Bulldozer & Bobcat PresentationAMD Hot Chips Bulldozer & Bobcat Presentation
AMD Hot Chips Bulldozer & Bobcat PresentationAMD
 
AMD: Where Gaming Begins
AMD: Where Gaming BeginsAMD: Where Gaming Begins
AMD: Where Gaming BeginsAMD
 
Evaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerEvaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerGeorge Markomanolis
 
If AMD Adopted OMI in their EPYC Architecture
If AMD Adopted OMI in their EPYC ArchitectureIf AMD Adopted OMI in their EPYC Architecture
If AMD Adopted OMI in their EPYC ArchitectureAllan Cantle
 
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APUHot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APUAMD
 
Supermicro’s Universal GPU: Modular, Standards Based and Built for the Future
Supermicro’s Universal GPU: Modular, Standards Based and Built for the FutureSupermicro’s Universal GPU: Modular, Standards Based and Built for the Future
Supermicro’s Universal GPU: Modular, Standards Based and Built for the FutureRebekah Rodriguez
 
Shared Memory Centric Computing with CXL & OMI
Shared Memory Centric Computing with CXL & OMIShared Memory Centric Computing with CXL & OMI
Shared Memory Centric Computing with CXL & OMIAllan Cantle
 
NVIDIA A100 ampere GPU
NVIDIA A100 ampere GPUNVIDIA A100 ampere GPU
NVIDIA A100 ampere GPUsystem_plus
 
Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power Deepak Shankar
 
Arm: Enabling CXL devices within the Data Center with Arm Solutions
Arm: Enabling CXL devices within the Data Center with Arm SolutionsArm: Enabling CXL devices within the Data Center with Arm Solutions
Arm: Enabling CXL devices within the Data Center with Arm SolutionsMemory Fabric Forum
 
Intel® core™ i5 700 desktop processor
Intel® core™ i5 700 desktop processorIntel® core™ i5 700 desktop processor
Intel® core™ i5 700 desktop processorYara Ali
 
MemVerge: The Software Stack for CXL Environments
MemVerge: The Software Stack for CXL EnvironmentsMemVerge: The Software Stack for CXL Environments
MemVerge: The Software Stack for CXL EnvironmentsMemory Fabric Forum
 

Tendances (20)

AMD Ryzen CPU Zen Cores Architecture
AMD Ryzen CPU Zen Cores ArchitectureAMD Ryzen CPU Zen Cores Architecture
AMD Ryzen CPU Zen Cores Architecture
 
AMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUs
AMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUsAMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUs
AMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUs
 
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APUDelivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
 
Heterogeneous Integration with 3D Packaging
Heterogeneous Integration with 3D PackagingHeterogeneous Integration with 3D Packaging
Heterogeneous Integration with 3D Packaging
 
7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance 7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance
 
Delivering the Future of High-Performance Computing
Delivering the Future of High-Performance ComputingDelivering the Future of High-Performance Computing
Delivering the Future of High-Performance Computing
 
AMD Hot Chips Bulldozer & Bobcat Presentation
AMD Hot Chips Bulldozer & Bobcat PresentationAMD Hot Chips Bulldozer & Bobcat Presentation
AMD Hot Chips Bulldozer & Bobcat Presentation
 
AMD: Where Gaming Begins
AMD: Where Gaming BeginsAMD: Where Gaming Begins
AMD: Where Gaming Begins
 
Evaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerEvaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI Supercomputer
 
If AMD Adopted OMI in their EPYC Architecture
If AMD Adopted OMI in their EPYC ArchitectureIf AMD Adopted OMI in their EPYC Architecture
If AMD Adopted OMI in their EPYC Architecture
 
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APUHot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
 
Supermicro’s Universal GPU: Modular, Standards Based and Built for the Future
Supermicro’s Universal GPU: Modular, Standards Based and Built for the FutureSupermicro’s Universal GPU: Modular, Standards Based and Built for the Future
Supermicro’s Universal GPU: Modular, Standards Based and Built for the Future
 
Shared Memory Centric Computing with CXL & OMI
Shared Memory Centric Computing with CXL & OMIShared Memory Centric Computing with CXL & OMI
Shared Memory Centric Computing with CXL & OMI
 
Breaking the Memory Wall
Breaking the Memory WallBreaking the Memory Wall
Breaking the Memory Wall
 
NVIDIA A100 ampere GPU
NVIDIA A100 ampere GPUNVIDIA A100 ampere GPU
NVIDIA A100 ampere GPU
 
Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power
 
Arm: Enabling CXL devices within the Data Center with Arm Solutions
Arm: Enabling CXL devices within the Data Center with Arm SolutionsArm: Enabling CXL devices within the Data Center with Arm Solutions
Arm: Enabling CXL devices within the Data Center with Arm Solutions
 
Intel® core™ i5 700 desktop processor
Intel® core™ i5 700 desktop processorIntel® core™ i5 700 desktop processor
Intel® core™ i5 700 desktop processor
 
MemVerge: The Software Stack for CXL Environments
MemVerge: The Software Stack for CXL EnvironmentsMemVerge: The Software Stack for CXL Environments
MemVerge: The Software Stack for CXL Environments
 
Pcie basic
Pcie basicPcie basic
Pcie basic
 

Similaire à AMD and the new “Zen” High Performance x86 Core at Hot Chips 28

POWER9 AC922 Newell System - HPC & AI
POWER9 AC922 Newell System - HPC & AI POWER9 AC922 Newell System - HPC & AI
POWER9 AC922 Newell System - HPC & AI Anand Haridass
 
Fujitsu Presents Post-K CPU Specifications
Fujitsu Presents Post-K CPU SpecificationsFujitsu Presents Post-K CPU Specifications
Fujitsu Presents Post-K CPU Specificationsinside-BigData.com
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionNVIDIA Taiwan
 
The sunsparc architecture
The sunsparc architectureThe sunsparc architecture
The sunsparc architectureTaha Malampatti
 
IBM Power9 Features and Specifications
IBM Power9 Features and SpecificationsIBM Power9 Features and Specifications
IBM Power9 Features and Specificationsinside-BigData.com
 
April 2014 IBM announcement webcast
April 2014 IBM announcement webcastApril 2014 IBM announcement webcast
April 2014 IBM announcement webcastHELP400
 
AI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersAI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersCastLabKAIST
 
4 p9 architecture overview japan meetup
4 p9 architecture overview japan meetup4 p9 architecture overview japan meetup
4 p9 architecture overview japan meetupYutaka Kawai
 
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...Red_Hat_Storage
 
SUN主机产品介绍.ppt
SUN主机产品介绍.pptSUN主机产品介绍.ppt
SUN主机产品介绍.pptPencilData
 
HPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeHPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeAnand Haridass
 
Advanced High-Performance Computing Features of the OpenPOWER ISA
 Advanced High-Performance Computing Features of the OpenPOWER ISA Advanced High-Performance Computing Features of the OpenPOWER ISA
Advanced High-Performance Computing Features of the OpenPOWER ISAGanesan Narayanasamy
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsAnand Haridass
 
Benchmark Processors- VAX 8600,MC68040,SPARC and Superscalar RISC
Benchmark Processors- VAX 8600,MC68040,SPARC and Superscalar RISCBenchmark Processors- VAX 8600,MC68040,SPARC and Superscalar RISC
Benchmark Processors- VAX 8600,MC68040,SPARC and Superscalar RISCPriyodarshini Dhar
 
Lenovo Dense Servers for the next generation: the NeXtScale System
Lenovo Dense Servers for the next generation: the NeXtScale SystemLenovo Dense Servers for the next generation: the NeXtScale System
Lenovo Dense Servers for the next generation: the NeXtScale SystemLenovo Data Center
 

Similaire à AMD and the new “Zen” High Performance x86 Core at Hot Chips 28 (20)

POWER9 for AI & HPC
POWER9 for AI & HPCPOWER9 for AI & HPC
POWER9 for AI & HPC
 
POWER9 AC922 Newell System - HPC & AI
POWER9 AC922 Newell System - HPC & AI POWER9 AC922 Newell System - HPC & AI
POWER9 AC922 Newell System - HPC & AI
 
Fujitsu Presents Post-K CPU Specifications
Fujitsu Presents Post-K CPU SpecificationsFujitsu Presents Post-K CPU Specifications
Fujitsu Presents Post-K CPU Specifications
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server Solution
 
Pentium
PentiumPentium
Pentium
 
Pentium
PentiumPentium
Pentium
 
QSpiders - Basic intel architecture
QSpiders - Basic intel architectureQSpiders - Basic intel architecture
QSpiders - Basic intel architecture
 
The sunsparc architecture
The sunsparc architectureThe sunsparc architecture
The sunsparc architecture
 
IBM Power9 Features and Specifications
IBM Power9 Features and SpecificationsIBM Power9 Features and Specifications
IBM Power9 Features and Specifications
 
April 2014 IBM announcement webcast
April 2014 IBM announcement webcastApril 2014 IBM announcement webcast
April 2014 IBM announcement webcast
 
AI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersAI Accelerators for Cloud Datacenters
AI Accelerators for Cloud Datacenters
 
4 p9 architecture overview japan meetup
4 p9 architecture overview japan meetup4 p9 architecture overview japan meetup
4 p9 architecture overview japan meetup
 
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
 
OpenIO ServerLess Storage
OpenIO ServerLess StorageOpenIO ServerLess Storage
OpenIO ServerLess Storage
 
SUN主机产品介绍.ppt
SUN主机产品介绍.pptSUN主机产品介绍.ppt
SUN主机产品介绍.ppt
 
HPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeHPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand Challenge
 
Advanced High-Performance Computing Features of the OpenPOWER ISA
 Advanced High-Performance Computing Features of the OpenPOWER ISA Advanced High-Performance Computing Features of the OpenPOWER ISA
Advanced High-Performance Computing Features of the OpenPOWER ISA
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of Systems
 
Benchmark Processors- VAX 8600,MC68040,SPARC and Superscalar RISC
Benchmark Processors- VAX 8600,MC68040,SPARC and Superscalar RISCBenchmark Processors- VAX 8600,MC68040,SPARC and Superscalar RISC
Benchmark Processors- VAX 8600,MC68040,SPARC and Superscalar RISC
 
Lenovo Dense Servers for the next generation: the NeXtScale System
Lenovo Dense Servers for the next generation: the NeXtScale SystemLenovo Dense Servers for the next generation: the NeXtScale System
Lenovo Dense Servers for the next generation: the NeXtScale System
 

Plus de AMD

AMD EPYC Family World Record Performance Summary Mar 2022
AMD EPYC Family World Record Performance Summary Mar 2022AMD EPYC Family World Record Performance Summary Mar 2022
AMD EPYC Family World Record Performance Summary Mar 2022AMD
 
AMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World RecordAMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World RecordAMD
 
AMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World RecordAMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World RecordAMD
 
AMD EPYC World Records
AMD EPYC World RecordsAMD EPYC World Records
AMD EPYC World RecordsAMD
 
AMD EPYC 7002 World Records
AMD EPYC 7002 World RecordsAMD EPYC 7002 World Records
AMD EPYC 7002 World RecordsAMD
 
AMD EPYC 7002 World Records
AMD EPYC 7002 World RecordsAMD EPYC 7002 World Records
AMD EPYC 7002 World RecordsAMD
 
AMD EPYC 100 World Records and Counting
AMD EPYC 100 World Records and CountingAMD EPYC 100 World Records and Counting
AMD EPYC 100 World Records and CountingAMD
 
AMD EPYC 7002 Launch World Records
AMD EPYC 7002 Launch World RecordsAMD EPYC 7002 Launch World Records
AMD EPYC 7002 Launch World RecordsAMD
 
AMD Next Horizon
AMD Next HorizonAMD Next Horizon
AMD Next HorizonAMD
 
AMD Next Horizon
AMD Next HorizonAMD Next Horizon
AMD Next HorizonAMD
 
AMD Next Horizon
AMD Next HorizonAMD Next Horizon
AMD Next HorizonAMD
 
Race to Reality: The Next Billion-People Market Opportunity
Race to Reality: The Next Billion-People Market OpportunityRace to Reality: The Next Billion-People Market Opportunity
Race to Reality: The Next Billion-People Market OpportunityAMD
 
GPU Compute in Medical and Print Imaging
GPU Compute in Medical and Print ImagingGPU Compute in Medical and Print Imaging
GPU Compute in Medical and Print ImagingAMD
 
Enabling ARM® Server Technology for the Datacenter
Enabling ARM® Server Technology for the DatacenterEnabling ARM® Server Technology for the Datacenter
Enabling ARM® Server Technology for the DatacenterAMD
 
Lessons From MineCraft: Building the Right SMB Network
Lessons From MineCraft: Building the Right SMB NetworkLessons From MineCraft: Building the Right SMB Network
Lessons From MineCraft: Building the Right SMB NetworkAMD
 

Plus de AMD (15)

AMD EPYC Family World Record Performance Summary Mar 2022
AMD EPYC Family World Record Performance Summary Mar 2022AMD EPYC Family World Record Performance Summary Mar 2022
AMD EPYC Family World Record Performance Summary Mar 2022
 
AMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World RecordAMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World Record
 
AMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World RecordAMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World Record
 
AMD EPYC World Records
AMD EPYC World RecordsAMD EPYC World Records
AMD EPYC World Records
 
AMD EPYC 7002 World Records
AMD EPYC 7002 World RecordsAMD EPYC 7002 World Records
AMD EPYC 7002 World Records
 
AMD EPYC 7002 World Records
AMD EPYC 7002 World RecordsAMD EPYC 7002 World Records
AMD EPYC 7002 World Records
 
AMD EPYC 100 World Records and Counting
AMD EPYC 100 World Records and CountingAMD EPYC 100 World Records and Counting
AMD EPYC 100 World Records and Counting
 
AMD EPYC 7002 Launch World Records
AMD EPYC 7002 Launch World RecordsAMD EPYC 7002 Launch World Records
AMD EPYC 7002 Launch World Records
 
AMD Next Horizon
AMD Next HorizonAMD Next Horizon
AMD Next Horizon
 
AMD Next Horizon
AMD Next HorizonAMD Next Horizon
AMD Next Horizon
 
AMD Next Horizon
AMD Next HorizonAMD Next Horizon
AMD Next Horizon
 
Race to Reality: The Next Billion-People Market Opportunity
Race to Reality: The Next Billion-People Market OpportunityRace to Reality: The Next Billion-People Market Opportunity
Race to Reality: The Next Billion-People Market Opportunity
 
GPU Compute in Medical and Print Imaging
GPU Compute in Medical and Print ImagingGPU Compute in Medical and Print Imaging
GPU Compute in Medical and Print Imaging
 
Enabling ARM® Server Technology for the Datacenter
Enabling ARM® Server Technology for the DatacenterEnabling ARM® Server Technology for the Datacenter
Enabling ARM® Server Technology for the Datacenter
 
Lessons From MineCraft: Building the Right SMB Network
Lessons From MineCraft: Building the Right SMB NetworkLessons From MineCraft: Building the Right SMB Network
Lessons From MineCraft: Building the Right SMB Network
 

Dernier

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Dernier (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

AMD and the new “Zen” High Performance x86 Core at Hot Chips 28

  • 1. 1 | HOT CHIPS 28 | AUGUST 23, 2016 MIKE CLARK SENIOR FELLOW A NEW X86 CORE ARCHITECTURE FOR THE NEXT GENERATION OF COMPUTING
  • 2. 2 | HOT CHIPS 28 | AUGUST 23, 2016 AGENDA THE ROAD TO ZEN HIGH LEVEL ARCHITECTURE ‐ IMPROVEMENTS IN CORE ENGINE ‐ FLOATING POINT ‐ IMPROVEMENTS IN CACHE SYSTEM ‐ SMT DESIGN TO MAXIMIZE THROUGHPUT ‐ NEW ISA EXTENSIONS SUMMARY NEXT STEP UP
  • 3. 3 | HOT CHIPS 28 | AUGUST 23, 2016 AMD X86 CORES: DRIVING COMPETITIVE PERFORMANCE *Based on internal AMD estimates for “Zen” x86 CPU core compared to “Excavator” x86 CPU core. INSTRUCTIONSPERCLOCK More Instructions Per Clock* 40%“Excavator” Core “Bulldozer” Core “ZEN”
  • 4. 4 | HOT CHIPS 28 | AUGUST 23, 2016 AMD CPU DESIGN OPTIMIZATION POINTS ONE CORE FROM FANLESS NOTEBOOKS TO SUPERCOMPUTERS “ZEN” PERFORMANCE LOWER POWER SMALLER MORE AREA, POWER MORE PERFORMANCE Low Power “Jaguar” High Performance “Excavator” *Based on internal AMD estimates for “Zen” x86 CPU core compared to “Excavator” x86 CPU core.
  • 5. 5 | HOT CHIPS 28 | AUGUST 23, 2016 DEFYING CONVENTION: A WIDE, HIGH PERFORMANCE, EFFICIENT CORE At = Energy Per Cycle +40% work per cycle* Total Efficiency Gain “ZEN” *Based on internal AMD estimates for “Zen” x86 CPU core compared to “Excavator” x86 CPU core. Instructions-Per-Clock Energy Per Cycle “Steamroller” “Piledriver” “Bulldozer” “Excavator”
  • 6. 6 | HOT CHIPS 28 | AUGUST 23, 2016 ZEN PERFORMANCE & POWER IMPROVEMENTS LOWER POWER ‐ Aggressive clock gating with multi-level regions ‐ Write back L1 cache ‐ Large Op Cache ‐ Stack Engine ‐ Move elimination ‐ Power focus from project inception ‐ Low Power Design Methodologies BETTER CACHE SYSTEM ‐ Write back L1 cache ‐ Faster L2 cache ‐ Faster L3 cache ‐ Faster Load to FPU: 7 vs. 9 cycles ‐ Better L1 and L2 data prefetcher ‐ Close to 2x the L1 and L2 bandwidth ‐ Total L3 bandwidth up 5x BETTER CORE ENGINE ‐ Two threads per core ‐ Branch mispredict improved ‐ Better branch prediction with 2 branches per BTB entry ‐ Large Op Cache ‐ Wider micro-op dispatch 6 vs. 4 ‐ Larger Instruction Schedulers Integer: 84 vs. 48 | FP: 96 vs. 60 ‐ Larger retire 8 ops vs. 4 ops ‐ Quad issue FPU ‐ Larger Retire Queue 192 vs. 128 ‐ Larger Load Queue 72 vs. 44 ‐ Larger Store Queue 44 vs. 32 40% IPC PERFORMANCE UPLIFT
  • 7. 7 | HOT CHIPS 28 | AUGUST 23, 2016 Decode 4 instructions/cycle 512K L2 (I+D) Cache 8 Way ADD MUL ADDMULALU 2 loads + 1 store per cycle 6 ops dispatched Op Cache INTEGER FLOATING POINT ALU ALU ALU Micro-op Queue 64K I-Cache 4 way Branch Prediction AGUAGU Load/Store Queues Integer Physical Register File 32K D-Cache 8 Way FP Register File Integer Rename Floating Point Rename Scheduler Scheduler Scheduler Scheduler SchedulerScheduler Scheduler  Fetch Four x86 instructions  Op Cache instructions  4 Integer units ‒ Large rename space – 168 Registers ‒ 192 instructions in flight/8 wide retire  2 Load/Store units ‒ 72 Out-of-Order Loads supported  2 Floating Point units x 128 FMACs ‒ built as 4 pipes, 2 Fadd, 2 Fmul  I-Cache 64K, 4-way  D-Cache 32K, 8-way  L2 Cache 512K, 8-way  Large shared L3 cache  2 threads per core ZEN MICROARCHITECTURE Micro-ops
  • 8. 8 | HOT CHIPS 28 | AUGUST 23, 2016  Decoupled Branch Prediction  TLB in the BP pipe ‒ 8 entry L0 TLB, all page sizes ‒ 64 entry L1 TLB, all page sizes ‒ 512 entry L2 TLB, no 1G pages  2 branches per BTB entry  Large L1 / L2 BTB  32 entry return stack  Indirect Target Array (ITA)  64K, 4-way Instruction cache  Micro-tags for IC & Op cache  32 byte fetch FETCHNext PC 64K Instruction Cache To Op Cache 32 bytes to Decode 32 bytes/ cycle from L2 Redirect from DE/EX Physical Request Queue Micro-Tags L1/L2 BTB Return Stack ITA Hash Perceptron L0/L1/L2 TLB
  • 9. 9 | HOT CHIPS 28 | AUGUST 23, 2016  Inline Instruction-length Decoder  Decode 4 x86 instructions  Op cache  Micro-op Queue  Stack Engine  Branch Fusion  Memory File for Store to Load Forwarding DECODE Instruction Byte Buffer Pick Dispatch From IC From Micro Tags Instructions To EX, 6 Micro-ops To FP, 4 Micro-ops Microcode Rom Stack Engine Memfile Micro-op Queue Decode Op Cache Micro-ops
  • 10. 10 | HOT CHIPS 28 | AUGUST 23, 2016  6x14 entry Scheduling Queues  168 entry Physical Register File  6 issue per cycle ‒ 4 ALU’s, 2 AGU’s  192 entry Retire Queue  Differential Checkpoints  2 Branches per cycle  Move Elimination  8-Wide Retire EXECUTE 168 Entry Physical Register File Forwarding Muxes Map Retire Queue 6 Micro-op Dispatch To DERedirect to Fetch AGU1AGU0ALU1 ALU3ALU2ALU0 AGQ1AGQ0ALQ1 ALQ3ALQ2ALQ0 LS
  • 11. 11 | HOT CHIPS 28 | AUGUST 23, 2016  72 Out of Order Loads  44 entry Store Queue  Split TLB/Data Pipe, store pipe  64 entry L1 TLB, all page sizes  1.5K entry L2 TLB, no 1G pages  32K, 8 way Data Cache ‒ Supports two 128-bit accesses  Optimized L1 and L2 Prefetchers  512K, private (2 threads), inclusive L2 LOAD/STORE AND L2 Load Queue TLB0 TLB1 32K Data Cache 32 bytes to/from L2 AGU0 AGU1 To Ex To FP To L2 MAB Store Pipe Pick L1 PickL0 Pick Pre Fetch Store Queue STP L1/L2 TLB + DC tags DAT0 DAT1 Store Commit WCB
  • 12. 12 | HOT CHIPS 28 | AUGUST 23, 2016  2 Level Scheduling Queue  160 entry Physical Register File  8 Wide Retire  1 pipe for 1x128b store  Accelerated Recovery on Flushes  SSE, AVX1, AVX2, AES, SHA, and legacy mmx/x87 compliant  2 AES units FLOATING POINT 160 Entry Physical Register File Forwarding Muxes NSQ 192 Entry Retire Queue 4 Micro-op Dispatch 8 Micro-op Retire ADD0 ADD1MUL1MUL0 SQ LDCVT 128 bit Loads Int to FP FP to Int
  • 13. 13 | HOT CHIPS 28 | AUGUST 23, 2016  Fast private 512K L2 cache  Fast shared L3 cache  High bandwidth enables prefetch improvements  L3 is filled from L2 victims  Fast cache-to-cache transfers  Large Queues for Handling L1 and L2 misses ZEN CACHE HIERARCHY 32B/cycle 32B fetch 32B/cycle CORE 0 32B/cycle 32B/cycle 2*16B load 1*16B store 8M L3 I+D Cache 16-way 512K L2 I+D Cache 8-way 32K D-Cache 8-way 64K I-Cache 4-way
  • 14. 14 | HOT CHIPS 28 | AUGUST 23, 2016  A CPU complex (CCX) is four cores connected to an L3 Cache.  The L3 Cache is 16-way associative, 8MB, mostly exclusive of L2.  The L3 Cache is made of 4 slices, by low-order address interleave.  Every core can access every cache with same average latency CPU COMPLEX CORE 3 CORE 1L3M 1MB L 3 C T L L 2 C T L L2M 512K L3M 1MB CORE 3L3M 1MB L 3 C T L L 2 C T L L2M 512K L3M 1MB CORE 0 L3M 1MB L 3 C T L L 2 C T L L2M 512K L3M 1MB CORE 2 L3M 1MB L 3 C T L L 2 C T L L2M 512K L3M 1MB
  • 15. 15 | HOT CHIPS 28 | AUGUST 23, 2016 INTEGER FLOATING POINT Integer Physical Register File FP Register File Decode 512K L2 (I+D) Cache 8 Way ADD MUL ADDMUL4x ALUs Micro-op Cache Micro-op Queue 64K I-Cache 4 way Branch Prediction Integer Rename Floating Point Rename Schedulers Scheduler L0/L1/L2 ITLB Retire Queue 2x AGUs Load Queue Store Queue 32K D-Cache 8 Way L1/L2 DTLBLoad Queue Store Queue 512K L2 (I+D) Cache 8 Way MUL ADD 32K D-Cache 8 Way L1/L2 DTLB Integer Physical Register File Integer Rename Schedulers FP Register File Floating Point Rename Scheduler 64K I-Cache 4 way Decode instructions ADDMUL4x ALUs Vertically Threaded Op-Cache micro-ops Micro-op Queue Branch Prediction  All structures fully available in 1T mode  Front End Queues are round robin with priority overrides  Increased throughput from SMT SMT OVERVIEW L0/L1/L2 ITLB Competitively shared structures Statically Partitioned Competitively shared with Algorithmic Priority Competitively shared and SMT Tagged Retire Queue 6 ops dispatched 2x AGUs
  • 16. 16 | HOT CHIPS 28 | AUGUST 23, 2016 NEW INSTRUCTIONS Feature Notes Excavator Zen ADX Extending multi-precision arithmetic support  RDSEED Complement to RDRAND random number generation  SMAP Supervisor Mode Access Prevention  SHA1/SHA256 Secure Hash Implementation Instructions  CLFLUSHOPT CLFLUSH ordered by SFENCE  XSAVEC/XSAVES/XRSTORS New Compact and Supervisor Save/Restore  CLZERO Clear Cache Line  PTE Coalescing Combines 4K page tables into 32K page size  We support all the standard ISA including AVX &AVX-2, BMI1 & BMI2, AES, RDRAND, SMEP AMD Exclusive
  • 17. 17 | HOT CHIPS 28 | AUGUST 23, 2016 “ZEN” Totally New High-performance Core Design DESIGNED FROM THE GROUND UP FOR OPTIMAL BALANCE OF PERFORMANCE AND POWER Simultaneous Multithreading (SMT) for High Throughput Energy-efficient FinFET Design Scales from Enterprise to Client Products New High-Bandwidth, Low Latency Cache System
  • 18. 18 | HOT CHIPS 28 | AUGUST 23, 2016 A COMMITTED ROADMAP TO X86 PERFORMANCE *Based on internal AMD estimates for “Zen” x86 CPU core compared to “Excavator” x86 CPU core. INSTRUCTIONSPERCLOCK More Instructions Per Clock* 40%“Excavator” Core “Bulldozer” Core “ZEN” “ZEN+”
  • 19. 19 | HOT CHIPS 28 | AUGUST 23, 2016 DISCLAIMER & ATTRIBUTION The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION © 2016 Advanced Micro Devices, Inc. All rights reserved. AMD, Radeon, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Other names are for informational purposes only and may be trademarks of their respective owners.