SlideShare une entreprise Scribd logo
1  sur  22
Télécharger pour lire hors ligne
Fujitsu High Performance CPU
for the Post-K Computer
August 21st, 2018
Toshio Yoshida
FUJITSU LIMITED
0 All Rights Reserved. Copyright © FUJITSU LIMITED 2018
A64FX is the new Fujitsu-designed Arm processor
• It is used in the post-K computer
A64FX is the first processor of the Armv8-A SVE architecture
• Fujitsu, as a lead partner, collaborated closely with Arm on the development of SVE
A64FX achieves high performance in HPC and AI areas
• Our own microarchitecture maximizes the capability of SVE
Key Message
All Rights Reserved. Copyright © FUJITSU LIMITED 20181
Fujitsu Processor Development
A64FX
Overview
Microarchitecture
Performance
Power Management
RAS
Software Development
Summary
Outline
All Rights Reserved. Copyright © FUJITSU LIMITED 20182
Fujitsu Processor Development
All Rights Reserved. Copyright © FUJITSU LIMITED 2018
Persistent Evolution > 60 years
2000~2003
SPARC64
SPARC64
II
SPARC64
V
SPARC64
GP
GS8900
GS21
600
GS8600
GS8800B
GS21
1600
SPARC64
V+
GS8800
GS21
900
Mainframe
PerformanceReliability
Store Ahead
Branch History
Prefetch
Single-chip CPU
Non-Blocking $
O-O-O Execution
Super-Scalar
L2$ on Die
HPC-ACE
System on Chip
Hardware Barrier
Multi-core Multi-thread
2004~2007 2008~2011
SPARC64
GP
2012~2017 2018~
Virtual Machine Architecture
Software on Chip
High-speed Interconnect
130nm
250nm /
220nm
180nm
:Technology generation
90nm
350nm
Tr=1B
CMOS Cu
40nm
65nm
HPC
UNIX
$ ECC
Register/ALU Parity
Instruction Retry
$ Dynamic Degradation
Error Checkers/History
45nm
Next
GS
SPARC64
X
SPARC64
XII
Next
SPARC
HPC+AI DLU
7nm
A64FX
(Post-K*)
Arm
SPARC64
VII
SPARC64
IXfx
SPARC64
VIIIfx
SPARC64
XIfx
SPARC64
X+
GS21
2600
GS21
3600SPARC64
VI
28nm
40nm
20nm
* Post-K is underdevelopment by RIKEN and Fujitsu
3
Scalable Many-core Architecture
512-bit SIMD for HPC and AI
High Bandwidth Memory
DNA of Fujitsu Processors
All Rights Reserved. Copyright © FUJITSU LIMITED 2018
 A64FX inherits DNA from Fujitsu technologies used in
the mainframes, UNIX and HPC servers
High reliability
Stability
Integrity
Continuity
High performance-per-watt
Execution and memory throughput
Low power
Massively parallel
High speed & flexibility
Thread performance
Software on Chip
Large SMP
CPU w/ extremely high throughput
High performance
Massively parallel
Low power
Stability and integrity
(C) RIKEN
4
A64FX Designed for HPC/AI
All Rights Reserved. Copyright © FUJITSU LIMITED 2018
Binary compatibility with
Armv8.2-A + SVE + SBSA* level3
3. High Efficiency
1. High Performance 2. High Throughput
4. Standard
Vector : 512-bit wide SIMD x 2 pipes /core
Memory : HBM2 (extremely high B/W)
Scalable : 48 cores, Tofu interconnect
A64FX = CPU with extremely high throughput
Performance
(D|S|H)GEMM >90%
Stream Triad >80%
Perf-per-watt >> General purpose CPU
Various data types
(FP64/32/16, INT64/32/16/8)
HPC/AI apps. >> General purpose CPU
*Arm’s “Server Base System Architecture”
5
A64FX Chip Overview
All Rights Reserved. Copyright © FUJITSU LIMITED 2018
 Architecture Features
• Armv8.2-A (AArch64 only)
• SVE 512-bit wide SIMD
• 48 computing cores + 4 assistant cores*
• HBM2 32GiB
• Tofu 6D Mesh/Torus
28Gbps x 2 lanes x 10 ports
• PCIe Gen3 16 lanes
 7nm FinFET
• 8,786M transistors
• 594 package signal pins
 Peak Performance (Efficiency)
• >2.7TFLOPS (>90%@DGEMM)
• Memory B/W 1024GB/s (>80%@Stream Triad)
Netwrok
on
Chip
HBM2
PCIe
controller
Tofu
controller
HBM2
HBM2
HBM2
<A64FX>
CMG specification
13 cores
L2$ 8MiB
Mem 8GiB, 256GB/s
I/O
PCIe Gen3 16 lanes
Tofu
28Gbps 2 lanes 10 ports
A64FX
(Post-K)
SPARC64 XIfx
(PRIMEHPC FX100)
ISA (Base) Armv8.2-A SPARC-V9
ISA (Extension) SVE HPC-ACE2
Process Node 7nm 20nm
Peak Performance >2.7TFLOPS 1.1TFLOPS
SIMD 512-bit 256-bit
# of Cores 48+4 32+2
Memory HBM2 HMC
Memory Peak B/W 1024GB/s 240GB/s x2 (in/out)
*All the cores are identical
6
A64FX Features
 Collaboration with Arm to develop and optimize SVE for a wide range of applications
• FP16 and INT16/8 dot product are introduced for AI applications
All Rights Reserved. Copyright © FUJITSU LIMITED 2018
A64FX
(Post-K)
SPARC64 XIfx
(PRIMEHPC FX100)
SPAR64 VIIIfx
(K computer)
ISA Armv8.2-A + SVE SPARC-V9 + HPC-ACE2 SPARC-V9 + HPC-ACE
SIMD Width 512-bit 256-bit 128-bit
Four-operand FMA  Enhanced  
Gather/Scatter  Enhanced 
Predicated Operations  Enhanced  
Math. Acceleration  Further enhanced  Enhanced 
Compress  Enhanced 
First Fault Load  New
FP16  New
INT16/ INT8 Dot Product  New
HW Barrier* / Sector Cache*  Further enhanced  Enhanced 
* Utilizing AArch64 implementation-defined system registers
7
A64FX Core Pipeline
 A64FX enhances and inherits superior features of SPARC64
• Inherits superscalar, out-of-order, branch prediction, etc.
• Enhances SIMD and predicate operations
– 2x 512-bit wide SIMD FMA + Predicate Operation + 4x ALU (shared w/ 2x AGEN)
– 2x 512-bit wide SIMD load or 512-bit wide SIMD store
All Rights Reserved. Copyright © FUJITSU LIMITED 2018
L1 I$
Branch
Predictor
Decode
& Issue
RSE0
RSA
RSE1
RSBR
PGPR
EXA
EXB
EAGA
EXC
EAGB
EXD
PFPR
Fetch
Port
Store
Port L1D$
HBM2 Controller
Fetch Issue Dispatch Reg-Read Execute Cache and Memory
CSE
Commit
PC
Control
Registers
L2$
HBM2
Write
Buffer
Tofu controller
Tofu Interconnect
52cores
FLA
PPR
FLB
PRX
PCI-GEN3
Main Enhanced block
PCI Controller
8
Four-operand FMA with Prefix Instruction
 MOVPRFX as a prefix instruction
 For SVE, four-operand “FMA4” requires a prefix instruction (MOVPRFX) followed by
destructive 3-operand FMA3
 A64FX implementation for MOVPRFX
 A64FX hides the overhead of its main pipeline by packing MOVPRFX and the following instruction
into a single operation
All Rights Reserved. Copyright © FUJITSU LIMITED 2018
“FMA4” : Z0<=Z3+Z1*Z2
Equivalent
MOVPRFX: Z0 <= Z3
FMA3 : Z0 <= Z0+Z1*Z2
(destructive)
Two instructions
L1I$/L2$/
Memory
Inst.
Buffer
Pre-
Decode
Two instructions as a single operation Two instructions
Inst.
Decode
CSE
PC
Exec.
Pipeline
MOVPRFX
FMA3
“FMA4”
Packing
two instructions
9
N/AN/A
Execution Unit
 Extremely high throughput
• 512-bit wide SIMD x 2 Pipelines x 48 Cores
• >90% execution efficiency in (D|S|H)GEMM and INT16/8 dot product
All Rights Reserved. Copyright © FUJITSU LIMITED 2018
A0 A1 A2 A3
B0 B1 B2 B3
X X X X
8bit 8bit 8bit 8bit
C
0.128 0.128
1.1 2.2>2.7
>5.4
>10.8
>21.6
0
5
10
15
20
25
64-bit 32-bit 16-bit 8-bit
SPARC64 VIIIfx
(K computer)
128-bit SIMD
SPARC64 XIfx
(PRIMEHPC FX100)
256-bit SIMD
A64FX
(Post-K)
512-bit SIMD
(TOPS)
(Element size)
(DGEMM) (SGEMM) (HGEMM)
(INT16 dot product)
(INT8 dot product) *1
Peak Performance (Chip-level)
N/AN/A
32bit
10
Level 1 Cache
 L1 cache throughput maximizes core performance
 Sustained throughput for 512-bit wide SIMD load
• An unaligned SIMD load crossing cache line keeps
the same throughput
 “Combined Gather” mechanism increasing gather throughput
• Gather processing is important for real HPC applications
• A64FX introduces “Combined Gather” mechanism enabling to return up to two consecutive
elements in a “128-byte aligned block” simultaneously
All Rights Reserved. Copyright © FUJITSU LIMITED 2018
L1D$
Read port0
Read port1
Read data0
64B/cycle
Read data1
64B/cycle
Memory
128B
0 1 2 3 4 5 6 7
0 1 3 2 6 7 45
8B
Register
flow-1 flow-2
flow-4 flow-3
Throughput/Core
[byte/cycle]
2x fasterGather
(Normal)
Combined
Gather
16 32
< Combined Gather >
11
Netwrok
on
Chip
(Ring bus)
Many-Core Architecture
 A64FX consists of four CMGs (Core Memory Group)
• A CMG consists of 13 cores, an L2 cache and a memory controller
- One out of 13 cores is an assistant core which handles daemon, I/O, etc.
• Four CMGs keep cache coherency by ccNUMA with on-chip directory
• X-bar connection in a CMG maximizes high efficiency for throughput of the L2 cache
• Process binding in a CMG allows linear scalability up to 48 cores
 On-chip-network with a wide ring bus secures I/O performance
core core core core core core
core core core core core core
core
X-Bar Connection
L2 cache 8MiB 16-way
Memory
Controller
Network
on Chip
CMG Configuration
CMG
CMG
CMG
CMG
HBM2
HBM2
HBM2
HBM2
A64FX Chip Configuration
HBM2
Tofu
Controller
PCIe
Controller
12 All Rights Reserved. Copyright © FUJITSU LIMITED 2018
HBM2 8GiBHBM2 8GiBHBM2 8GiB
 Extremely high bandwidth in caches and memory
• A64FX has out-of-order mechanisms in cores, caches and memory controllers.
It maximizes the capability of each layer’s bandwidth
Performance
>2.7TFLOPS
High Bandwidth
All Rights Reserved. Copyright © FUJITSU LIMITED 2018
CMG
L1 Cache
>11.0TB/s (BF ratio = 4)
L2 Cache
>3.6TB/s (BF ratio = 1.3)
L1D 64KiB, 4way
512-bit wide SIMD
2x FMAs
Core Core CoreCore
>230
GB/s
>115
GB/s
12x Computing Cores + 1x Assistant Core
Memory
1024GB/s (BF ratio =~0.37)
>115
GB/s
>57
GB/s
HBM2 8GiB
L2 Cache 8MiB, 16way
256
GB/s
13
 A64FX boosts performance up by microarchitectural enhancements,
512-bit wide SIMD, HBM2 and process technology
• > 2.5x faster in HPC/AI benchmarks than SPARC64 XIfx (Fujitsu’s previous HPC CPU)
• The results are based on the Fujitsu compiler optimized for our microarchitecture and SVE
Performance
0
2
4
6
8
DGEMM Stream
Triad
Fluid
dynamics
Atomosphere Seismic wave
propagation
Convolution
FP32
Convolution
Low Precision
All Rights Reserved. Copyright © FUJITSU LIMITED 2018
Normalizedto
SPARC64XIfx
(PRIMEHPCFX100)
A64FX Benchmark Kernel Performance (Preliminary results)
Throughput
(DGEMM / Stream)
Application
Kernel
HPC AI
512-bit SIMD INT8
dot product
L2$ B/WMemory B/W
Baseline: SPARC64 XIfx ( PRIMEHPC FX100)
(Estimated)
2.5TF
(>90%)
2.5x
9.4x
830
GB/s
(>80%)
L1 $ B/WCombined
Gather
3.4x
2.8x3.0x
14
Power Management
 “Energy monitor” / “Energy analyzer” for activity-based power estimation
 Energy monitor (per chip) : Node power via Power API* (~msec)
• Average power estimation of a node, CMG (cores, an L2 cache, a memory) etc.
 Energy analyzer (per core) : Power profiler via PAPI** (~nsec)
• Fine grained power analysis of a core, an L2 cache and a memory
→ Enabling chip-level power monitoring and detailed power analysis of applications
All Rights Reserved. Copyright © FUJITSU LIMITED 2018
Power
calc.
Core
activity
Core L2 cache
Power
calc.
L2 cache
activity
Memory
Power
calc.
Memory
activity
CMG#0 CMG#1 CMG#2 CMG#3
Power calc. for PCIePower calc. for Tofu
Node 12 Cores L2 cache HBM2 Tofu PCIe
Energy Analyzer
Assistant
core
Energy Monitor
Core
L2 Cache
Memory
CMG
<A64FX Energy monitor/ Energy analyzer>
*Sandia National Laboratory
** Performance Application Programming Interface
A64FX
15
 “Power knob” for power optimization
 A64FX provides power management function called “Power Knob”
• Applications can change hardware configurations for power optimization
→ Power knobs and Energy monitor/analyzer will help users to optimize power consumption of
their applications
L1 I$
Branch
Predictor
Decode
& Issue
RSE0
RSA
RSE1
RSBR
PGPR
EXA
EXB
EAGA
EXC
EAGB
EXD
PFPR
Fetch
Port
Store
Port L1D$
HBM2 Controller
Fetch Issue Dispatch Reg-Read Execute Cache and Memory
CSE
Commit
PC
Control
Registers
L2$
HBM2
Write
Buffer
Tofu controller
FLA
PPR
FLB
PRX
Power Management (Cont.)
All Rights Reserved. Copyright © FUJITSU LIMITED 2018
FL pipeline usage: FLA only
HBM2 B/W adjustment (units of 10%)
Frequency reductionDecode width: 2 EX pipeline usage : EXA only
PCI Controller
<A64FX Power Knob Diagram>
52 cores
16
Fujitsu Mission Critical Technologies
All Rights Reserved. Copyright © FUJITSU LIMITED 2018
 Large systems require extensive RAS capability of CPU and interconnect
 A64FX has a mainframe class RAS for integrity and stability.
It contributes to very low CPU failure rate and high system stability
 ECC or duplication for all caches
 Parity check for execution units
 Hardware instruction retry
 Hardware lane recovery for Tofu links
 ~128,400 error checkers in total
Units Error Detection and Correction
Cache (Tag) ECC, Duplicate & Parity
Cache (Data) ECC, Parity
Register ECC (INT), Parity(Others)
Execution Unit Parity, Residue
Core Hardware Instruction Retry
Tofu Hardware Lane Recovery
<A64FX RAS Mechanism>
Green: 1 bit error Correctable
Yellow: 1 bit error Detectable
Gray : 1 bit error harmless
<A64FX RAS Diagram>
17
Software Development
All Rights Reserved. Copyright © FUJITSU LIMITED 2018
 RIKEN and Fujitsu are developing software stacks for the post-K computer
• Fujitsu compilers are optimized for the microarchitecture, maximizing SVE and HBM2 performance
 We collaboratively work with RIKEN / Linaro / OSS communities / ISVs
and contribute to Arm HPC ecosystem
Post-K System Hardware
Linux OS / McKernel (Lightweight Kernel)
Fujitsu Technical Computing Suite / RIKEN Advanced System Software
Post-K Applications
Management Software Programming EnvironmentFile System
System management
for high availability &
power saving operation
Job management for
higher system utilization
& power efficiency
LLIO
NVM-based File I/O
accelerator
OpenMP, COARRAY, Math Libs.
MPI (Open MPI, MPICH)
XcalableMPFEFS
Lustre-based
distributed file system
Compilers (C, C++, Fortran)
Debugging and tuning tools
18
Summary
A64FX is the first processor of the Armv8-A SVE architecture.
It is used for the post-K computer
Fujitsu’s proven microarchitecture achieves high performance in
HPC and AI areas
Fujitsu collaboratively works with partners and continuously contributes to
Arm ecosystem
We will continue to develop Arm processors
All Rights Reserved. Copyright © FUJITSU LIMITED 201819
All Rights Reserved. Copyright © FUJITSU LIMITED 201820
Abbreviations
A64FX
RSA: Reservation station for address generation
RSE: Reservation station for execution
RSBR: Reservation station for branch
PGPR: Physical general-purpose register
PFPR: Physical floating-point register
PPR: Physical predicate register
CSE: Commit stack entry
EAG: Effective address generator
EX : Integer execution unit
FL : Floating-point execution unit
PRX : Predicate execution unit
Tofu: Torus-Fusion
All Rights Reserved. Copyright©FUJITSU LIMITED 201821

Contenu connexe

Tendances

Ls catalog thiet bi tu dong gm e_0908_dienhathe.vn
Ls catalog thiet bi tu dong gm e_0908_dienhathe.vnLs catalog thiet bi tu dong gm e_0908_dienhathe.vn
Ls catalog thiet bi tu dong gm e_0908_dienhathe.vnDien Ha The
 
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPCExceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPCinside-BigData.com
 
Jetson AGX Xavier and the New Era of Autonomous Machines
Jetson AGX Xavier and the New Era of Autonomous MachinesJetson AGX Xavier and the New Era of Autonomous Machines
Jetson AGX Xavier and the New Era of Autonomous MachinesDustin Franklin
 
Snapdragon SoC and ARMv7 Architecture
Snapdragon SoC and ARMv7 ArchitectureSnapdragon SoC and ARMv7 Architecture
Snapdragon SoC and ARMv7 ArchitectureSantosh Verma
 
RISC-V and OpenPOWER open-ISA and open-HW - a swiss army knife for HPC
RISC-V  and OpenPOWER open-ISA and open-HW - a swiss army knife for HPCRISC-V  and OpenPOWER open-ISA and open-HW - a swiss army knife for HPC
RISC-V and OpenPOWER open-ISA and open-HW - a swiss army knife for HPCGanesan Narayanasamy
 
Hardware architecture of Summit Supercomputer
 Hardware architecture of Summit Supercomputer Hardware architecture of Summit Supercomputer
Hardware architecture of Summit SupercomputerVigneshwarRamaswamy
 
AMD EPYC 7002 Launch World Records
AMD EPYC 7002 Launch World RecordsAMD EPYC 7002 Launch World Records
AMD EPYC 7002 Launch World RecordsAMD
 
ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip ArchitecturesISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip ArchitecturesAMD
 
AMD Naples CPU for Data Center
AMD Naples CPU for Data CenterAMD Naples CPU for Data Center
AMD Naples CPU for Data CenterLow Hong Chuan
 
An Introduction to Draganflyer X8 UAV
An Introduction to  Draganflyer X8 UAV An Introduction to  Draganflyer X8 UAV
An Introduction to Draganflyer X8 UAV Dr. Mohieddin Moradi
 
X3850x5techpresentation09 29-2010-101118124714-phpapp01
X3850x5techpresentation09 29-2010-101118124714-phpapp01X3850x5techpresentation09 29-2010-101118124714-phpapp01
X3850x5techpresentation09 29-2010-101118124714-phpapp01Yalçin KARACA
 
Fpga video capturing
Fpga video capturingFpga video capturing
Fpga video capturingshehryar88
 
Lame ps700 power 7 per sito
Lame ps700 power 7 per sitoLame ps700 power 7 per sito
Lame ps700 power 7 per sitoLorenzo Corbetta
 
Nanos and Nanos2 modules catalog september 2014
Nanos and Nanos2 modules catalog september 2014Nanos and Nanos2 modules catalog september 2014
Nanos and Nanos2 modules catalog september 2014ETEP
 
MYC-C7Z015 CPU Module
MYC-C7Z015 CPU ModuleMYC-C7Z015 CPU Module
MYC-C7Z015 CPU ModuleLinda Zhang
 

Tendances (19)

Ls catalog thiet bi tu dong gm e_0908_dienhathe.vn
Ls catalog thiet bi tu dong gm e_0908_dienhathe.vnLs catalog thiet bi tu dong gm e_0908_dienhathe.vn
Ls catalog thiet bi tu dong gm e_0908_dienhathe.vn
 
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPCExceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
 
Zynq ultrascale
Zynq ultrascaleZynq ultrascale
Zynq ultrascale
 
Jetson AGX Xavier and the New Era of Autonomous Machines
Jetson AGX Xavier and the New Era of Autonomous MachinesJetson AGX Xavier and the New Era of Autonomous Machines
Jetson AGX Xavier and the New Era of Autonomous Machines
 
Snapdragon SoC and ARMv7 Architecture
Snapdragon SoC and ARMv7 ArchitectureSnapdragon SoC and ARMv7 Architecture
Snapdragon SoC and ARMv7 Architecture
 
RISC-V and OpenPOWER open-ISA and open-HW - a swiss army knife for HPC
RISC-V  and OpenPOWER open-ISA and open-HW - a swiss army knife for HPCRISC-V  and OpenPOWER open-ISA and open-HW - a swiss army knife for HPC
RISC-V and OpenPOWER open-ISA and open-HW - a swiss army knife for HPC
 
Hardware architecture of Summit Supercomputer
 Hardware architecture of Summit Supercomputer Hardware architecture of Summit Supercomputer
Hardware architecture of Summit Supercomputer
 
uCluster
uClusteruCluster
uCluster
 
AMD EPYC 7002 Launch World Records
AMD EPYC 7002 Launch World RecordsAMD EPYC 7002 Launch World Records
AMD EPYC 7002 Launch World Records
 
ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip ArchitecturesISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
 
AMD Naples CPU for Data Center
AMD Naples CPU for Data CenterAMD Naples CPU for Data Center
AMD Naples CPU for Data Center
 
SDC Server Sao Jose
SDC Server Sao JoseSDC Server Sao Jose
SDC Server Sao Jose
 
Power7 family quick ref
Power7 family quick refPower7 family quick ref
Power7 family quick ref
 
An Introduction to Draganflyer X8 UAV
An Introduction to  Draganflyer X8 UAV An Introduction to  Draganflyer X8 UAV
An Introduction to Draganflyer X8 UAV
 
X3850x5techpresentation09 29-2010-101118124714-phpapp01
X3850x5techpresentation09 29-2010-101118124714-phpapp01X3850x5techpresentation09 29-2010-101118124714-phpapp01
X3850x5techpresentation09 29-2010-101118124714-phpapp01
 
Fpga video capturing
Fpga video capturingFpga video capturing
Fpga video capturing
 
Lame ps700 power 7 per sito
Lame ps700 power 7 per sitoLame ps700 power 7 per sito
Lame ps700 power 7 per sito
 
Nanos and Nanos2 modules catalog september 2014
Nanos and Nanos2 modules catalog september 2014Nanos and Nanos2 modules catalog september 2014
Nanos and Nanos2 modules catalog september 2014
 
MYC-C7Z015 CPU Module
MYC-C7Z015 CPU ModuleMYC-C7Z015 CPU Module
MYC-C7Z015 CPU Module
 

Similaire à Fujitsu Presents Post-K CPU Specifications

SUN主机产品介绍.ppt
SUN主机产品介绍.pptSUN主机产品介绍.ppt
SUN主机产品介绍.pptPencilData
 
April 2014 IBM announcement webcast
April 2014 IBM announcement webcastApril 2014 IBM announcement webcast
April 2014 IBM announcement webcastHELP400
 
08 Supercomputer Fugaku
08 Supercomputer Fugaku08 Supercomputer Fugaku
08 Supercomputer FugakuRCCSRENKEI
 
Hpe Proliant DL20 Gen10 Server Datasheet
Hpe Proliant DL20 Gen10 Server DatasheetHpe Proliant DL20 Gen10 Server Datasheet
Hpe Proliant DL20 Gen10 Server Datasheet美兰 曾
 
Hpe Proliant DL325 Gen10 Server Datasheet
Hpe Proliant DL325 Gen10 Server DatasheetHpe Proliant DL325 Gen10 Server Datasheet
Hpe Proliant DL325 Gen10 Server Datasheet美兰 曾
 
4 p9 architecture overview japan meetup
4 p9 architecture overview japan meetup4 p9 architecture overview japan meetup
4 p9 architecture overview japan meetupYutaka Kawai
 
Power 7 Overview
Power 7 OverviewPower 7 Overview
Power 7 Overviewlambertt
 
한컴MDS_NVIDIA Jetson Platform
한컴MDS_NVIDIA Jetson Platform한컴MDS_NVIDIA Jetson Platform
한컴MDS_NVIDIA Jetson PlatformHANCOM MDS
 
Synopsys User Group Presentation
Synopsys User Group PresentationSynopsys User Group Presentation
Synopsys User Group Presentationemlawgr
 
If AMD Adopted OMI in their EPYC Architecture
If AMD Adopted OMI in their EPYC ArchitectureIf AMD Adopted OMI in their EPYC Architecture
If AMD Adopted OMI in their EPYC ArchitectureAllan Cantle
 
PENTIUM - PRO MICROPROCESSORS MP SY.pptx
PENTIUM - PRO MICROPROCESSORS MP SY.pptxPENTIUM - PRO MICROPROCESSORS MP SY.pptx
PENTIUM - PRO MICROPROCESSORS MP SY.pptxSanjayBhosale20
 
IBM Flex System p24L, p260 and p460 Compute Nodes
IBM Flex System p24L, p260 and p460 Compute NodesIBM Flex System p24L, p260 and p460 Compute Nodes
IBM Flex System p24L, p260 and p460 Compute NodesIBM India Smarter Computing
 
Industry’s performance leading ultra low-power dsp solution
Industry’s performance leading ultra low-power dsp solutionIndustry’s performance leading ultra low-power dsp solution
Industry’s performance leading ultra low-power dsp solutionAnalog Devices, Inc.
 
Introduce: IBM Power Linux with PowerKVM
Introduce: IBM Power Linux with PowerKVMIntroduce: IBM Power Linux with PowerKVM
Introduce: IBM Power Linux with PowerKVMZainal Abidin
 
IBM System x3850 X5 Technical Presentation
IBM System x3850 X5 Technical PresentationIBM System x3850 X5 Technical Presentation
IBM System x3850 X5 Technical PresentationCliff Kinard
 

Similaire à Fujitsu Presents Post-K CPU Specifications (20)

SUN主机产品介绍.ppt
SUN主机产品介绍.pptSUN主机产品介绍.ppt
SUN主机产品介绍.ppt
 
Phytium 64 core cpu preview
Phytium 64 core cpu previewPhytium 64 core cpu preview
Phytium 64 core cpu preview
 
Yeni Nesil Sunucular ile Veritabanınız
Yeni Nesil Sunucular ile VeritabanınızYeni Nesil Sunucular ile Veritabanınız
Yeni Nesil Sunucular ile Veritabanınız
 
The Cell Processor
The Cell ProcessorThe Cell Processor
The Cell Processor
 
April 2014 IBM announcement webcast
April 2014 IBM announcement webcastApril 2014 IBM announcement webcast
April 2014 IBM announcement webcast
 
08 Supercomputer Fugaku
08 Supercomputer Fugaku08 Supercomputer Fugaku
08 Supercomputer Fugaku
 
Hpe Proliant DL20 Gen10 Server Datasheet
Hpe Proliant DL20 Gen10 Server DatasheetHpe Proliant DL20 Gen10 Server Datasheet
Hpe Proliant DL20 Gen10 Server Datasheet
 
Hpe Proliant DL325 Gen10 Server Datasheet
Hpe Proliant DL325 Gen10 Server DatasheetHpe Proliant DL325 Gen10 Server Datasheet
Hpe Proliant DL325 Gen10 Server Datasheet
 
4 p9 architecture overview japan meetup
4 p9 architecture overview japan meetup4 p9 architecture overview japan meetup
4 p9 architecture overview japan meetup
 
I7 processor
I7 processorI7 processor
I7 processor
 
Power 7 Overview
Power 7 OverviewPower 7 Overview
Power 7 Overview
 
한컴MDS_NVIDIA Jetson Platform
한컴MDS_NVIDIA Jetson Platform한컴MDS_NVIDIA Jetson Platform
한컴MDS_NVIDIA Jetson Platform
 
POWER9 for AI & HPC
POWER9 for AI & HPCPOWER9 for AI & HPC
POWER9 for AI & HPC
 
Synopsys User Group Presentation
Synopsys User Group PresentationSynopsys User Group Presentation
Synopsys User Group Presentation
 
If AMD Adopted OMI in their EPYC Architecture
If AMD Adopted OMI in their EPYC ArchitectureIf AMD Adopted OMI in their EPYC Architecture
If AMD Adopted OMI in their EPYC Architecture
 
PENTIUM - PRO MICROPROCESSORS MP SY.pptx
PENTIUM - PRO MICROPROCESSORS MP SY.pptxPENTIUM - PRO MICROPROCESSORS MP SY.pptx
PENTIUM - PRO MICROPROCESSORS MP SY.pptx
 
IBM Flex System p24L, p260 and p460 Compute Nodes
IBM Flex System p24L, p260 and p460 Compute NodesIBM Flex System p24L, p260 and p460 Compute Nodes
IBM Flex System p24L, p260 and p460 Compute Nodes
 
Industry’s performance leading ultra low-power dsp solution
Industry’s performance leading ultra low-power dsp solutionIndustry’s performance leading ultra low-power dsp solution
Industry’s performance leading ultra low-power dsp solution
 
Introduce: IBM Power Linux with PowerKVM
Introduce: IBM Power Linux with PowerKVMIntroduce: IBM Power Linux with PowerKVM
Introduce: IBM Power Linux with PowerKVM
 
IBM System x3850 X5 Technical Presentation
IBM System x3850 X5 Technical PresentationIBM System x3850 X5 Technical Presentation
IBM System x3850 X5 Technical Presentation
 

Plus de inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 

Plus de inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Dernier

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Dernier (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

Fujitsu Presents Post-K CPU Specifications

  • 1. Fujitsu High Performance CPU for the Post-K Computer August 21st, 2018 Toshio Yoshida FUJITSU LIMITED 0 All Rights Reserved. Copyright © FUJITSU LIMITED 2018
  • 2. A64FX is the new Fujitsu-designed Arm processor • It is used in the post-K computer A64FX is the first processor of the Armv8-A SVE architecture • Fujitsu, as a lead partner, collaborated closely with Arm on the development of SVE A64FX achieves high performance in HPC and AI areas • Our own microarchitecture maximizes the capability of SVE Key Message All Rights Reserved. Copyright © FUJITSU LIMITED 20181
  • 3. Fujitsu Processor Development A64FX Overview Microarchitecture Performance Power Management RAS Software Development Summary Outline All Rights Reserved. Copyright © FUJITSU LIMITED 20182
  • 4. Fujitsu Processor Development All Rights Reserved. Copyright © FUJITSU LIMITED 2018 Persistent Evolution > 60 years 2000~2003 SPARC64 SPARC64 II SPARC64 V SPARC64 GP GS8900 GS21 600 GS8600 GS8800B GS21 1600 SPARC64 V+ GS8800 GS21 900 Mainframe PerformanceReliability Store Ahead Branch History Prefetch Single-chip CPU Non-Blocking $ O-O-O Execution Super-Scalar L2$ on Die HPC-ACE System on Chip Hardware Barrier Multi-core Multi-thread 2004~2007 2008~2011 SPARC64 GP 2012~2017 2018~ Virtual Machine Architecture Software on Chip High-speed Interconnect 130nm 250nm / 220nm 180nm :Technology generation 90nm 350nm Tr=1B CMOS Cu 40nm 65nm HPC UNIX $ ECC Register/ALU Parity Instruction Retry $ Dynamic Degradation Error Checkers/History 45nm Next GS SPARC64 X SPARC64 XII Next SPARC HPC+AI DLU 7nm A64FX (Post-K*) Arm SPARC64 VII SPARC64 IXfx SPARC64 VIIIfx SPARC64 XIfx SPARC64 X+ GS21 2600 GS21 3600SPARC64 VI 28nm 40nm 20nm * Post-K is underdevelopment by RIKEN and Fujitsu 3 Scalable Many-core Architecture 512-bit SIMD for HPC and AI High Bandwidth Memory
  • 5. DNA of Fujitsu Processors All Rights Reserved. Copyright © FUJITSU LIMITED 2018  A64FX inherits DNA from Fujitsu technologies used in the mainframes, UNIX and HPC servers High reliability Stability Integrity Continuity High performance-per-watt Execution and memory throughput Low power Massively parallel High speed & flexibility Thread performance Software on Chip Large SMP CPU w/ extremely high throughput High performance Massively parallel Low power Stability and integrity (C) RIKEN 4
  • 6. A64FX Designed for HPC/AI All Rights Reserved. Copyright © FUJITSU LIMITED 2018 Binary compatibility with Armv8.2-A + SVE + SBSA* level3 3. High Efficiency 1. High Performance 2. High Throughput 4. Standard Vector : 512-bit wide SIMD x 2 pipes /core Memory : HBM2 (extremely high B/W) Scalable : 48 cores, Tofu interconnect A64FX = CPU with extremely high throughput Performance (D|S|H)GEMM >90% Stream Triad >80% Perf-per-watt >> General purpose CPU Various data types (FP64/32/16, INT64/32/16/8) HPC/AI apps. >> General purpose CPU *Arm’s “Server Base System Architecture” 5
  • 7. A64FX Chip Overview All Rights Reserved. Copyright © FUJITSU LIMITED 2018  Architecture Features • Armv8.2-A (AArch64 only) • SVE 512-bit wide SIMD • 48 computing cores + 4 assistant cores* • HBM2 32GiB • Tofu 6D Mesh/Torus 28Gbps x 2 lanes x 10 ports • PCIe Gen3 16 lanes  7nm FinFET • 8,786M transistors • 594 package signal pins  Peak Performance (Efficiency) • >2.7TFLOPS (>90%@DGEMM) • Memory B/W 1024GB/s (>80%@Stream Triad) Netwrok on Chip HBM2 PCIe controller Tofu controller HBM2 HBM2 HBM2 <A64FX> CMG specification 13 cores L2$ 8MiB Mem 8GiB, 256GB/s I/O PCIe Gen3 16 lanes Tofu 28Gbps 2 lanes 10 ports A64FX (Post-K) SPARC64 XIfx (PRIMEHPC FX100) ISA (Base) Armv8.2-A SPARC-V9 ISA (Extension) SVE HPC-ACE2 Process Node 7nm 20nm Peak Performance >2.7TFLOPS 1.1TFLOPS SIMD 512-bit 256-bit # of Cores 48+4 32+2 Memory HBM2 HMC Memory Peak B/W 1024GB/s 240GB/s x2 (in/out) *All the cores are identical 6
  • 8. A64FX Features  Collaboration with Arm to develop and optimize SVE for a wide range of applications • FP16 and INT16/8 dot product are introduced for AI applications All Rights Reserved. Copyright © FUJITSU LIMITED 2018 A64FX (Post-K) SPARC64 XIfx (PRIMEHPC FX100) SPAR64 VIIIfx (K computer) ISA Armv8.2-A + SVE SPARC-V9 + HPC-ACE2 SPARC-V9 + HPC-ACE SIMD Width 512-bit 256-bit 128-bit Four-operand FMA  Enhanced   Gather/Scatter  Enhanced  Predicated Operations  Enhanced   Math. Acceleration  Further enhanced  Enhanced  Compress  Enhanced  First Fault Load  New FP16  New INT16/ INT8 Dot Product  New HW Barrier* / Sector Cache*  Further enhanced  Enhanced  * Utilizing AArch64 implementation-defined system registers 7
  • 9. A64FX Core Pipeline  A64FX enhances and inherits superior features of SPARC64 • Inherits superscalar, out-of-order, branch prediction, etc. • Enhances SIMD and predicate operations – 2x 512-bit wide SIMD FMA + Predicate Operation + 4x ALU (shared w/ 2x AGEN) – 2x 512-bit wide SIMD load or 512-bit wide SIMD store All Rights Reserved. Copyright © FUJITSU LIMITED 2018 L1 I$ Branch Predictor Decode & Issue RSE0 RSA RSE1 RSBR PGPR EXA EXB EAGA EXC EAGB EXD PFPR Fetch Port Store Port L1D$ HBM2 Controller Fetch Issue Dispatch Reg-Read Execute Cache and Memory CSE Commit PC Control Registers L2$ HBM2 Write Buffer Tofu controller Tofu Interconnect 52cores FLA PPR FLB PRX PCI-GEN3 Main Enhanced block PCI Controller 8
  • 10. Four-operand FMA with Prefix Instruction  MOVPRFX as a prefix instruction  For SVE, four-operand “FMA4” requires a prefix instruction (MOVPRFX) followed by destructive 3-operand FMA3  A64FX implementation for MOVPRFX  A64FX hides the overhead of its main pipeline by packing MOVPRFX and the following instruction into a single operation All Rights Reserved. Copyright © FUJITSU LIMITED 2018 “FMA4” : Z0<=Z3+Z1*Z2 Equivalent MOVPRFX: Z0 <= Z3 FMA3 : Z0 <= Z0+Z1*Z2 (destructive) Two instructions L1I$/L2$/ Memory Inst. Buffer Pre- Decode Two instructions as a single operation Two instructions Inst. Decode CSE PC Exec. Pipeline MOVPRFX FMA3 “FMA4” Packing two instructions 9
  • 11. N/AN/A Execution Unit  Extremely high throughput • 512-bit wide SIMD x 2 Pipelines x 48 Cores • >90% execution efficiency in (D|S|H)GEMM and INT16/8 dot product All Rights Reserved. Copyright © FUJITSU LIMITED 2018 A0 A1 A2 A3 B0 B1 B2 B3 X X X X 8bit 8bit 8bit 8bit C 0.128 0.128 1.1 2.2>2.7 >5.4 >10.8 >21.6 0 5 10 15 20 25 64-bit 32-bit 16-bit 8-bit SPARC64 VIIIfx (K computer) 128-bit SIMD SPARC64 XIfx (PRIMEHPC FX100) 256-bit SIMD A64FX (Post-K) 512-bit SIMD (TOPS) (Element size) (DGEMM) (SGEMM) (HGEMM) (INT16 dot product) (INT8 dot product) *1 Peak Performance (Chip-level) N/AN/A 32bit 10
  • 12. Level 1 Cache  L1 cache throughput maximizes core performance  Sustained throughput for 512-bit wide SIMD load • An unaligned SIMD load crossing cache line keeps the same throughput  “Combined Gather” mechanism increasing gather throughput • Gather processing is important for real HPC applications • A64FX introduces “Combined Gather” mechanism enabling to return up to two consecutive elements in a “128-byte aligned block” simultaneously All Rights Reserved. Copyright © FUJITSU LIMITED 2018 L1D$ Read port0 Read port1 Read data0 64B/cycle Read data1 64B/cycle Memory 128B 0 1 2 3 4 5 6 7 0 1 3 2 6 7 45 8B Register flow-1 flow-2 flow-4 flow-3 Throughput/Core [byte/cycle] 2x fasterGather (Normal) Combined Gather 16 32 < Combined Gather > 11
  • 13. Netwrok on Chip (Ring bus) Many-Core Architecture  A64FX consists of four CMGs (Core Memory Group) • A CMG consists of 13 cores, an L2 cache and a memory controller - One out of 13 cores is an assistant core which handles daemon, I/O, etc. • Four CMGs keep cache coherency by ccNUMA with on-chip directory • X-bar connection in a CMG maximizes high efficiency for throughput of the L2 cache • Process binding in a CMG allows linear scalability up to 48 cores  On-chip-network with a wide ring bus secures I/O performance core core core core core core core core core core core core core X-Bar Connection L2 cache 8MiB 16-way Memory Controller Network on Chip CMG Configuration CMG CMG CMG CMG HBM2 HBM2 HBM2 HBM2 A64FX Chip Configuration HBM2 Tofu Controller PCIe Controller 12 All Rights Reserved. Copyright © FUJITSU LIMITED 2018
  • 14. HBM2 8GiBHBM2 8GiBHBM2 8GiB  Extremely high bandwidth in caches and memory • A64FX has out-of-order mechanisms in cores, caches and memory controllers. It maximizes the capability of each layer’s bandwidth Performance >2.7TFLOPS High Bandwidth All Rights Reserved. Copyright © FUJITSU LIMITED 2018 CMG L1 Cache >11.0TB/s (BF ratio = 4) L2 Cache >3.6TB/s (BF ratio = 1.3) L1D 64KiB, 4way 512-bit wide SIMD 2x FMAs Core Core CoreCore >230 GB/s >115 GB/s 12x Computing Cores + 1x Assistant Core Memory 1024GB/s (BF ratio =~0.37) >115 GB/s >57 GB/s HBM2 8GiB L2 Cache 8MiB, 16way 256 GB/s 13
  • 15.  A64FX boosts performance up by microarchitectural enhancements, 512-bit wide SIMD, HBM2 and process technology • > 2.5x faster in HPC/AI benchmarks than SPARC64 XIfx (Fujitsu’s previous HPC CPU) • The results are based on the Fujitsu compiler optimized for our microarchitecture and SVE Performance 0 2 4 6 8 DGEMM Stream Triad Fluid dynamics Atomosphere Seismic wave propagation Convolution FP32 Convolution Low Precision All Rights Reserved. Copyright © FUJITSU LIMITED 2018 Normalizedto SPARC64XIfx (PRIMEHPCFX100) A64FX Benchmark Kernel Performance (Preliminary results) Throughput (DGEMM / Stream) Application Kernel HPC AI 512-bit SIMD INT8 dot product L2$ B/WMemory B/W Baseline: SPARC64 XIfx ( PRIMEHPC FX100) (Estimated) 2.5TF (>90%) 2.5x 9.4x 830 GB/s (>80%) L1 $ B/WCombined Gather 3.4x 2.8x3.0x 14
  • 16. Power Management  “Energy monitor” / “Energy analyzer” for activity-based power estimation  Energy monitor (per chip) : Node power via Power API* (~msec) • Average power estimation of a node, CMG (cores, an L2 cache, a memory) etc.  Energy analyzer (per core) : Power profiler via PAPI** (~nsec) • Fine grained power analysis of a core, an L2 cache and a memory → Enabling chip-level power monitoring and detailed power analysis of applications All Rights Reserved. Copyright © FUJITSU LIMITED 2018 Power calc. Core activity Core L2 cache Power calc. L2 cache activity Memory Power calc. Memory activity CMG#0 CMG#1 CMG#2 CMG#3 Power calc. for PCIePower calc. for Tofu Node 12 Cores L2 cache HBM2 Tofu PCIe Energy Analyzer Assistant core Energy Monitor Core L2 Cache Memory CMG <A64FX Energy monitor/ Energy analyzer> *Sandia National Laboratory ** Performance Application Programming Interface A64FX 15
  • 17.  “Power knob” for power optimization  A64FX provides power management function called “Power Knob” • Applications can change hardware configurations for power optimization → Power knobs and Energy monitor/analyzer will help users to optimize power consumption of their applications L1 I$ Branch Predictor Decode & Issue RSE0 RSA RSE1 RSBR PGPR EXA EXB EAGA EXC EAGB EXD PFPR Fetch Port Store Port L1D$ HBM2 Controller Fetch Issue Dispatch Reg-Read Execute Cache and Memory CSE Commit PC Control Registers L2$ HBM2 Write Buffer Tofu controller FLA PPR FLB PRX Power Management (Cont.) All Rights Reserved. Copyright © FUJITSU LIMITED 2018 FL pipeline usage: FLA only HBM2 B/W adjustment (units of 10%) Frequency reductionDecode width: 2 EX pipeline usage : EXA only PCI Controller <A64FX Power Knob Diagram> 52 cores 16
  • 18. Fujitsu Mission Critical Technologies All Rights Reserved. Copyright © FUJITSU LIMITED 2018  Large systems require extensive RAS capability of CPU and interconnect  A64FX has a mainframe class RAS for integrity and stability. It contributes to very low CPU failure rate and high system stability  ECC or duplication for all caches  Parity check for execution units  Hardware instruction retry  Hardware lane recovery for Tofu links  ~128,400 error checkers in total Units Error Detection and Correction Cache (Tag) ECC, Duplicate & Parity Cache (Data) ECC, Parity Register ECC (INT), Parity(Others) Execution Unit Parity, Residue Core Hardware Instruction Retry Tofu Hardware Lane Recovery <A64FX RAS Mechanism> Green: 1 bit error Correctable Yellow: 1 bit error Detectable Gray : 1 bit error harmless <A64FX RAS Diagram> 17
  • 19. Software Development All Rights Reserved. Copyright © FUJITSU LIMITED 2018  RIKEN and Fujitsu are developing software stacks for the post-K computer • Fujitsu compilers are optimized for the microarchitecture, maximizing SVE and HBM2 performance  We collaboratively work with RIKEN / Linaro / OSS communities / ISVs and contribute to Arm HPC ecosystem Post-K System Hardware Linux OS / McKernel (Lightweight Kernel) Fujitsu Technical Computing Suite / RIKEN Advanced System Software Post-K Applications Management Software Programming EnvironmentFile System System management for high availability & power saving operation Job management for higher system utilization & power efficiency LLIO NVM-based File I/O accelerator OpenMP, COARRAY, Math Libs. MPI (Open MPI, MPICH) XcalableMPFEFS Lustre-based distributed file system Compilers (C, C++, Fortran) Debugging and tuning tools 18
  • 20. Summary A64FX is the first processor of the Armv8-A SVE architecture. It is used for the post-K computer Fujitsu’s proven microarchitecture achieves high performance in HPC and AI areas Fujitsu collaboratively works with partners and continuously contributes to Arm ecosystem We will continue to develop Arm processors All Rights Reserved. Copyright © FUJITSU LIMITED 201819
  • 21. All Rights Reserved. Copyright © FUJITSU LIMITED 201820
  • 22. Abbreviations A64FX RSA: Reservation station for address generation RSE: Reservation station for execution RSBR: Reservation station for branch PGPR: Physical general-purpose register PFPR: Physical floating-point register PPR: Physical predicate register CSE: Commit stack entry EAG: Effective address generator EX : Integer execution unit FL : Floating-point execution unit PRX : Predicate execution unit Tofu: Torus-Fusion All Rights Reserved. Copyright©FUJITSU LIMITED 201821