SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
Mars: A 64-core ARMv8 Processor
Charles Zhang
Phytium Technology Co., Ltd
Phytium Technology Co., Ltd
Statements
The following slides are presented to introduce the
general features of one of our products, instead of any
commitment about it. It is for information purposes
only, and may not be incorporated into any contract. It
is not suggested to make purchasing decisions
accordingly. The development, release, and timing of
any features or functionality described here remains at
the sole discretion of Phytium.
2
Phytium Technology Co., Ltd
A Brief Introduction of Phytium
 China corporation, founded in 2012
 Guangzhou
 Tianjin
 Vision
 Leading edge CPU and ASIC provider in China
 Market focuses on chips for
 Internet & Cloud Computing infrastructure
 Traditional workload mainframe servers
3
Phytium Technology Co., Ltd
China is a Fast-growing Server Market
4
Company
1Q15
Revenue
1Q15 Market
Share (%)
1Q14
Revenue
1Q14 Market
Share (%)
1Q15-1Q14
Growth (%)
HP 3,191,694,948 23.8 2,890,992,229 25.5 10.4
Dell 2,296,473,026 17.1 2,006,639,006 17.7 14.4
IBM 1,887,939,141 14.1 2,244,631,789 19.8 -15.9
Lenovo 970,254,659 7.2 127,973,470 1.1 658.2
Cisco 890,179,930 6.6 616,620,000 5.4 44.4
Others 4,157,871,704 31.0 3,469,383,444 30.6 19.8
Total 13,394,413,409 100.0 11,356,239,939 100.0 17.9
Company
1Q15
Revenue
1Q15 Market
Share (%)
1Q14
Revenue
1Q14 Market
Share (%)
1Q15-1Q14
Growth (%)
Inspur 332,613,480 21 227,328,256 17 46
Dell 322,063,140 20 246,281,271 19 31
Lenovo 295,914,571 18 80,084,826 6 270
HP 217,487,450 14 167,775,923 13 30
Huawei 197,490,419 12 189,963,266 14 4
Sugon 140,377.091 9 70,705,366 5 99
Others 104,566,737 6 329,549,621 25 -68
Total 1,610,512,888 100.0 1,311,688,529 100.0 23
Source: Gartner (May 2015)
China
WW
Phytium Technology Co., Ltd
What is Mars for?
5
High performance
High volume of memory
High bandwidth memory access
High bandwidth I/O access
Large scale cache coherency maintained
Moderate performance
High power efficiency
High density computing
High bandwidth memory
access
Low cost
Mars
Earth
Phytium Technology Co., Ltd
Mars Overview
 Architecture Features
 64 Xiaomi cores, ARMv8
compatible
 Hardware-maintained global
cache coherency
 Panel-based data affinity
architecture
 Mesh topology on chip network
 32MB L2 cache
 8 Cache & Memory Chips (CMC)
 128MB L3 cache
 16 DDR3-1600 channels
 Two 16-lane PCIE3.0 i/f
 ECC and parity protection on all
caches, tags and TLBs
6
Physical
• ~180M instances
• 2.0GHz@28nm
• 120W
Performance
• Peak:512GFLOPS
• Mem BW:204GB/s
• I/O BW: 32GB/s
panel0 panel1 panel3 panel2
panel4 panel5 panel7 panel6
CMC
PCIe
PCIe
DDR3
DDR3
CMC
DDR3
DDR3
CMC
DDR3
DDR3
CMC
DDR3
DDR3
CMC
DDR3
DDR3
CMC
DDR3
DDR3
CMC
DDR3
DDR3
CMC
DDR3
DDR3
Phytium Technology Co., Ltd
Panel Architecture
 Eight Xiaomi Cores
 Compatible design with ARMv8 arch license
 Both AArch32 and AArch64 modes
 EL0~EL3 supported
 ASIMD-128 supported
 Adv. hybrid Branch Prediction
 4 fetch/4 decode/4 dispatch Out-of-Order
superscalar pipeline
 Cache Hierarchy
 Separated L1 ICache and L1 Dcache
 Shared L2 cache, totally 4MB
 Directory-based cache coherency
maintenance
 Directory Control Unit (DCU)
 Routing Cell
7
Xiaomi
Xiaomi
Xiaomi
Xiaomi
L2cache
Routing Cell
DCU
DCU
Xiaomi
Xiaomi
Xiaomi
Xiaomi
L2cache
6000μm
10600μm
Phytium Technology Co., Ltd8
Xiaomi Core
ITLB I CacheBTB
DirPre
IndPre
SRS Loop
Detect
Instruction Buffer
decoderdecoderdecoderdecoder
Rename Logic
Arch.
Reg
file
Phy.
Reg
file
Dispatch Logic
Reorder
Buffer
SCInt/VT
Queue
FP/VT
Queue
LD/ST
Queue
ALU
/BR
FMA
C/FDI
V
FMA
C/FDI
V
DTLB
D Cache
L2 Cache
STB & Prefetch
Prefetch
Debug
/Trace
/Interrupt
/Timer
ALU
/BR
MUL
/DIV
MUL
/DIV
MCI/VT
Queue
Phytium Technology Co., Ltd
ITLB I CacheBTB
DirPre
IndPre
SRS Loop
Detect
Instruction Buffer
decoderdecoderdecoderdecoder
Rename Logic
Arch.
Reg
file
Phy.
Reg
file
Dispatch Logic
Reorder
Buffer
SCInt/VT
Queue
FP/VT
Queue
LD/ST
Queue
ALU
/BR
FMA
C/FDI
V
FMA
C/FDI
V
DTLB
D Cache
L2 Cache
STB & Prefetch
Prefetch
Debug
/Trace
/Interrupt
/Timer
ALU
/BR
MUL
/DIV
MUL
/DIV
MCI/VT
Queue
9
Xiaomi Core Front End
ITLB I CacheBTB
DirPre
IndPre
SRS Loop
Detect
Instruction Buffer
Prefetch
• 32KB L1 instr. Cache
• Next line prefetch
• Hybrid Branch Predictor
• 2048-entry BTB
• Direction predict with TAGE predictor
• 512-entry indirect predictor
• 48-entry Speculative Return Stack
• Four instructions fetched per cycle
• 32-entry instruction buffer
• Loop detect and Instr. Cache bypass
Phytium Technology Co., Ltd
ITLB I CacheBTB
DirPre
IndPre
SRS Loop
Detect
Instruction Buffer
decoderdecoderdecoderdecoder
Rename Logic
Arch.
Reg
file
Phy.
Reg
file
Dispatch Logic
Reorder
Buffer
SCInt/VT
Queue
FP/VT
Queue
LD/ST
Queue
ALU
/BR
FMA
C/FDI
V
FMA
C/FDI
V
DTLB
D Cache
L2 Cache
STB & Prefetch
Prefetch
Debug
/Trace
/Interrupt
/Timer
ALU
/BR
MUL
/DIV
MUL
/DIV
MCI/VT
Queue
10
Xiaomi Core Decode, Rename & Dispatch
• Up to four instructions
decoded per cycle
• 192 physical registers
• Up to four instructions
renamed per cycle
decoderdecoderdecoderdecoder
Rename Logic
Arch.
Reg
file
Phy.
Reg
file
Dispatch Logic
Reorder
Buffer
• Up to four instructions dispatched per cycle
• Reorder buffer can hold 160 instructions, and about 210+ instructions
can be in-flight in the whole pipeline.
• Dispatch in-order, execution out-of-order, retirement in-order.
Phytium Technology Co., Ltd
ITLB I CacheBTB
DirPre
IndPre
SRS Loop
Detect
Instruction Buffer
decoderdecoderdecoderdecoder
Rename Logic
Arch.
Reg
file
Phy.
Reg
file
Dispatch Logic
Reorder
Buffer
SCInt/VT
Queue
FP/VT
Queue
LD/ST
Queue
ALU
/BR
FMA
C/FDI
V
FMA
C/FDI
V
DTLB
D Cache
L2 Cache
STB & Prefetch
Prefetch
Debug
/Trace
/Interrupt
/Timer
ALU
/BR
MUL
/DIV
MUL
/DIV
MCI/VT
Queue
11
Xiaomi Core Function Units
• Two separated 16-entry integer and ASIMD queues shared by four
integer units
• Two integer unit can process single-cycle integer instructions and
integer SIMD instructions, one can also process branch instructions.
• Two integer units can process multi-cycle integer instructions and
integer SIMD instructions.
• One shared16-entry floating point and ASIMD queue
• Two FP/ASIMD units equipped, which can be combined into one
lockstep ASIMD unit.
• FMA supported in both units.
• FMUL: 3cycles, FADD: 3cycles, FMA: 6cycles
SCInt/VT
Queue
FP/VT
Queue
ALU
/BR
FMA
C/FDI
V
FMA
C/FDI
V
ALU
/BR
MUL
/DIV
MUL
/DIV
MCI/VT
Queue
Phytium Technology Co., Ltd
ITLB I CacheBTB
DirPre
IndPre
SRS Loop
Detect
Instruction Buffer
decoderdecoderdecoderdecoder
Rename Logic
Arch.
Reg
file
Phy.
Reg
file
Dispatch Logic
Reorder
Buffer
SCInt/VT
Queue
FP/VT
Queue
LD/ST
Queue
ALU
/BR
FMA
C/FDI
V
FMA
C/FDI
V
DTLB
D Cache
L2 Cache
STB & Prefetch
Prefetch
Debug
/Trace
/Interrupt
/Timer
ALU
/BR
MUL
/DIV
MUL
/DIV
MCI/VT
Queue
12
Xiaomi Core Function Units
• One 24-entry load/store queue
• 32KB L1 data cache
• 6 outstanding loads
• 4 cycles latency from load to use
• Next line and stride detected data
prefetch
• Streamlined pattern auto detected
LD/ST
Queue
DTLB
D Cache
STB & Prefetch
Phytium Technology Co., Ltd
Cache coherence protocol
 Hawk cache coherence protocol
 Distributed directory-based global cache coherency
 MOESI-like packet-based coherence protocol
 A home node DCU(directory control unit) supports
 Affinitive pairing of L2Cs and CMCs
 “Infinite” capacity for non-conflicting Reads & Writes
 Optimized transaction flow for exclusive atomic accesses
 Reduced latency by cacheline forwarding
13
L2C L2C L2C
Hawk
L3C &
Memory
I/O
Interconnects
Global Exclusive Monitor
Core0
Core7
Coherence Logic
PanelN
MEM
Core0
Core7
Coherence Logic
Panel0
Local Monitor
Phytium Technology Co., Ltd
Network on Chip
 2D Concentrated Mesh Architecture
 Cell based switch with 6 bidirectional ports
 Uniform package format for each port, a port can be configured to be
connected with a device or cascade cell
 4 physical channels for CC and 1 channel for debug, DOR Y-X routing
 Low latency: 3 cycles for each hop
 High bandwidth: 384GB/s each cell
Cell
1
5
3
0
2
4
L2cache
L2cache
MIU/IOU
MIU or
Cascade
Dest. Lat. (cycles)
0 3
1 6
2 9
3 12
4 15
5 12
6 9
7 6
Avg. 9
3
4
Cell0
1
5
3
0
2
4 Cell1
0
2
4
1
5
3
Cell4
5
1
3
2
0
4 Cell5
2
0
4
5
1
3 Cell7
5
1
3
2
0
4
Cell2
0
2
4
1
5
3Cell3
1
5
3
0
2
4
Cell6
2
0
4
5
1
3
master
0
1 2
56
7
14
Phytium Technology Co., Ltd
Cache & Memory Chip
 L3 cache
 16MB Data Array
 2MB Data ECC
 DDR bandwidth
 2 x DDR3-800:25.6GB/s
 Proprietary interface between Mars &
CMC
 Parallel interface
 Needs more pins, but lower latency than
serdes
 Separate write/cmd and read data
channel
L3
Bank0
Mars Interface
L3
Bank1
L3
Bank2
L3
Bank3
Mem
Ctrl0
Mem
Ctrl1
D
D
R
D
D
R
15
 Effective read channel bandwidth:12.8GB/s
 Effective write/cmd channel bandwidth:6.4GB/s
Phytium Technology Co., Ltd
Latency of affinitive access
Memory access latency(ns)
Local L1 cache hit ~2
Local L2 cache hit ~8
Affinitive L2 cache hit ~20
Affinitive L3 cache hit ~36
Affinitive DDR access ~70
• Panel : 2.0GHz
• NoC: 2.0GHz
• CMC: 1.5GHz
* PCB latency not considered
16
Xiaomi
Xiaomi
Xiaomi
Xiaomi
L2cache
Routing Cell
DCU
DCU
Xiaomi
Xiaomi
Xiaomi
Xiaomi
L2cache
Xiaomi
Xiaomi
Xiaomi
Xiaomi
L2cache
Routing Cell
DCU
DCU
Xiaomi
Xiaomi
Xiaomi
Xiaomi
L2cache
CMC
Phytium Technology Co., Ltd
Memory Tune (mTune)
 Rich Data Collection
 Number of cache hits/misses for L1/L2/L3
 Workload of cache pipelines
 Busyness of the NoC
 ECC corrections of the memory system
 Support Multiple Metrics
 Average Miss rate/Hit rate
 Minimal/Maximal/Average Access Latency
 Bandwidth Analysis
 Concurrent Average Memory Access Time (CAMAT)
 Support MPI/OpenMP Applications
 Thread behavior analysis
 Inter-process behavior analysis
17
Phytium Technology Co., Ltd
Scalable Debug System
 ARMv8 CoreSight Compatible debug system
 Scalable dedicated debug network across 64 cores
 Distributed debug components
 Configurable events broadcast scope
 Timestamp broadcasts with single signal to simplify
implementation
18
Cell0
DNB_C
DNB_C
1
5
3
0
DNB_M
DNB_I
2
4 Cell14 3
1 0
5 2
DNB_C
DNB_C
DNB_M
Cell4
DNB_C
DNB_C
5
1
3
2
0
4
DNB_M
Cell54 3
5 2
1 0
DNB_C
DNB_CDNB_M
Cell33 4
0 1
2 5
DNB_C
DNB_C
DNB_M
Cell24 3
1 0
5 2
DNB_C
DNB_C
DNB_M
Cell7
DNB_C
DNB_C
5
1
3
2
0
4
DNB_M
Cell64 3
5 2
1 0
DNB_C
DNB_CDNB_M
panel0 panel1
panel4 panel5
panel3 panel2
panel7 panel6
hdbg
JTAG1
JTAG2
Phytium Technology Co., Ltd
Physical Design
 28nm process
 0.9v core/1.8v IO
 10 metal layers
 ~180M instances
 2.0GHz
 120W
 640mm2 die size
 FCBGA
 ~3000 pins
19
25.38mm
25.2mm
Phytium Technology Co., Ltd
Performance Evaluation
 SpecCPU2006
20
Single copy of SPEC CPU benchmark 64 copies of SPEC CPU benchmark
19.2 17.8
0
5
10
15
20
25
INT FP
SPEC_CPU2006_base
672
585
0
100
200
300
400
500
600
700
800
INT FP
SPEC_CPU2006_rate
Phytium Technology Co., Ltd
Performance Evaluation
 STREAM
21
0
10
20
30
40
50
60
70
80
90
1 2 4 8 16 24 32 40 48 56 64
STREAM triad
#cores
Bandwidth(GB/s)
Phytium Technology Co., Ltd
Next Generation Scale-up CPU
 More powerful core
 Aggressive Branch Predictor
 Multithreading
 More aggressive ILP exploitation
 Wider SIMD
 More RAS features
 Higher bandwidth memory access
 Higher power efficiency
22
Mars: A 64-core ARMv8 Processor
Charles Zhang
Charles.zhang@phytium.com.cn

Contenu connexe

Tendances

SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER TutorialSCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER TutorialGanesan Narayanasamy
 
BXI: Bull eXascale Interconnect
BXI: Bull eXascale InterconnectBXI: Bull eXascale Interconnect
BXI: Bull eXascale Interconnectinside-BigData.com
 
Arm as a Viable Architecture for HPC and AI
Arm as a Viable Architecture for HPC and AIArm as a Viable Architecture for HPC and AI
Arm as a Viable Architecture for HPC and AIinside-BigData.com
 
High Performance Interconnects: Landscape, Assessments & Rankings
High Performance Interconnects: Landscape, Assessments & RankingsHigh Performance Interconnects: Landscape, Assessments & Rankings
High Performance Interconnects: Landscape, Assessments & Rankingsinside-BigData.com
 
01 high bandwidth acquisitioncomputing compressionall in a box
01 high bandwidth acquisitioncomputing compressionall in a box01 high bandwidth acquisitioncomputing compressionall in a box
01 high bandwidth acquisitioncomputing compressionall in a boxYutaka Kawai
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraLinaro
 
Xilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systemsXilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systemsGanesan Narayanasamy
 

Tendances (20)

@IBM Power roadmap 8
@IBM Power roadmap 8 @IBM Power roadmap 8
@IBM Power roadmap 8
 
POWER9 for AI & HPC
POWER9 for AI & HPCPOWER9 for AI & HPC
POWER9 for AI & HPC
 
Arm in HPC
Arm in HPCArm in HPC
Arm in HPC
 
An Update on Arm HPC
An Update on Arm HPCAn Update on Arm HPC
An Update on Arm HPC
 
DOME 64-bit μDataCenter
DOME 64-bit μDataCenterDOME 64-bit μDataCenter
DOME 64-bit μDataCenter
 
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER TutorialSCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
 
BXI: Bull eXascale Interconnect
BXI: Bull eXascale InterconnectBXI: Bull eXascale Interconnect
BXI: Bull eXascale Interconnect
 
ARM HPC Ecosystem
ARM HPC EcosystemARM HPC Ecosystem
ARM HPC Ecosystem
 
IBM HPC Transformation with AI
IBM HPC Transformation with AI IBM HPC Transformation with AI
IBM HPC Transformation with AI
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 
Ac922 cdac webinar
Ac922 cdac webinarAc922 cdac webinar
Ac922 cdac webinar
 
Arm as a Viable Architecture for HPC and AI
Arm as a Viable Architecture for HPC and AIArm as a Viable Architecture for HPC and AI
Arm as a Viable Architecture for HPC and AI
 
POWER10 innovations for HPC
POWER10 innovations for HPCPOWER10 innovations for HPC
POWER10 innovations for HPC
 
OpenPOWER Latest Updates
OpenPOWER Latest UpdatesOpenPOWER Latest Updates
OpenPOWER Latest Updates
 
Sx 6-single-node
Sx 6-single-nodeSx 6-single-node
Sx 6-single-node
 
High Performance Interconnects: Landscape, Assessments & Rankings
High Performance Interconnects: Landscape, Assessments & RankingsHigh Performance Interconnects: Landscape, Assessments & Rankings
High Performance Interconnects: Landscape, Assessments & Rankings
 
OpenPOWER Webinar
OpenPOWER Webinar OpenPOWER Webinar
OpenPOWER Webinar
 
01 high bandwidth acquisitioncomputing compressionall in a box
01 high bandwidth acquisitioncomputing compressionall in a box01 high bandwidth acquisitioncomputing compressionall in a box
01 high bandwidth acquisitioncomputing compressionall in a box
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
 
Xilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systemsXilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systems
 

Similaire à Phytium 64 core cpu preview

Hpe Proliant DL325 Gen10 Server Datasheet
Hpe Proliant DL325 Gen10 Server DatasheetHpe Proliant DL325 Gen10 Server Datasheet
Hpe Proliant DL325 Gen10 Server Datasheet美兰 曾
 
MPC854XE: PowerQUICC III Processors
MPC854XE: PowerQUICC III ProcessorsMPC854XE: PowerQUICC III Processors
MPC854XE: PowerQUICC III ProcessorsPremier Farnell
 
Fujitsu Presents Post-K CPU Specifications
Fujitsu Presents Post-K CPU SpecificationsFujitsu Presents Post-K CPU Specifications
Fujitsu Presents Post-K CPU Specificationsinside-BigData.com
 
PIC32MX Microcontroller Family
PIC32MX Microcontroller FamilyPIC32MX Microcontroller Family
PIC32MX Microcontroller FamilyPremier Farnell
 
Presentation sparc m6 m5-32 server technical overview
Presentation   sparc m6 m5-32 server technical overviewPresentation   sparc m6 m5-32 server technical overview
Presentation sparc m6 m5-32 server technical overviewsolarisyougood
 
Introduction to i.MX27 Multimedia Applications Processors
Introduction to i.MX27 Multimedia Applications ProcessorsIntroduction to i.MX27 Multimedia Applications Processors
Introduction to i.MX27 Multimedia Applications ProcessorsPremier Farnell
 
ds894-zynq-ultrascale-plus-overview
ds894-zynq-ultrascale-plus-overviewds894-zynq-ultrascale-plus-overview
ds894-zynq-ultrascale-plus-overviewAngela Suen
 
SBC6020 SAM9G20 based Single Board Computer
SBC6020 SAM9G20 based Single Board ComputerSBC6020 SAM9G20 based Single Board Computer
SBC6020 SAM9G20 based Single Board Computeryclinda666
 
Overview of ST7 8-bit Microcontrollers
Overview of ST7 8-bit MicrocontrollersOverview of ST7 8-bit Microcontrollers
Overview of ST7 8-bit MicrocontrollersPremier Farnell
 
Introducing OMAP-L138/AM1808 Processor Architecture and Hawkboard Peripherals
Introducing OMAP-L138/AM1808 Processor Architecture and Hawkboard PeripheralsIntroducing OMAP-L138/AM1808 Processor Architecture and Hawkboard Peripherals
Introducing OMAP-L138/AM1808 Processor Architecture and Hawkboard PeripheralsPremier Farnell
 
Cyclone II FPGA Overview
Cyclone II FPGA OverviewCyclone II FPGA Overview
Cyclone II FPGA OverviewPremier Farnell
 
SoM with Zynq UltraScale device
SoM with Zynq UltraScale deviceSoM with Zynq UltraScale device
SoM with Zynq UltraScale devicenie, jack
 
SUN主机产品介绍.ppt
SUN主机产品介绍.pptSUN主机产品介绍.ppt
SUN主机产品介绍.pptPencilData
 
Hpe Proliant DL20 Gen10 Server Datasheet
Hpe Proliant DL20 Gen10 Server DatasheetHpe Proliant DL20 Gen10 Server Datasheet
Hpe Proliant DL20 Gen10 Server Datasheet美兰 曾
 

Similaire à Phytium 64 core cpu preview (20)

Hpe Proliant DL325 Gen10 Server Datasheet
Hpe Proliant DL325 Gen10 Server DatasheetHpe Proliant DL325 Gen10 Server Datasheet
Hpe Proliant DL325 Gen10 Server Datasheet
 
MPC854XE: PowerQUICC III Processors
MPC854XE: PowerQUICC III ProcessorsMPC854XE: PowerQUICC III Processors
MPC854XE: PowerQUICC III Processors
 
Fujitsu Presents Post-K CPU Specifications
Fujitsu Presents Post-K CPU SpecificationsFujitsu Presents Post-K CPU Specifications
Fujitsu Presents Post-K CPU Specifications
 
TMS320C5x
TMS320C5xTMS320C5x
TMS320C5x
 
PIC32MX Microcontroller Family
PIC32MX Microcontroller FamilyPIC32MX Microcontroller Family
PIC32MX Microcontroller Family
 
Presentation sparc m6 m5-32 server technical overview
Presentation   sparc m6 m5-32 server technical overviewPresentation   sparc m6 m5-32 server technical overview
Presentation sparc m6 m5-32 server technical overview
 
The Cell Processor
The Cell ProcessorThe Cell Processor
The Cell Processor
 
Introduction to i.MX27 Multimedia Applications Processors
Introduction to i.MX27 Multimedia Applications ProcessorsIntroduction to i.MX27 Multimedia Applications Processors
Introduction to i.MX27 Multimedia Applications Processors
 
ds894-zynq-ultrascale-plus-overview
ds894-zynq-ultrascale-plus-overviewds894-zynq-ultrascale-plus-overview
ds894-zynq-ultrascale-plus-overview
 
SBC6020 SAM9G20 based Single Board Computer
SBC6020 SAM9G20 based Single Board ComputerSBC6020 SAM9G20 based Single Board Computer
SBC6020 SAM9G20 based Single Board Computer
 
Overview of ST7 8-bit Microcontrollers
Overview of ST7 8-bit MicrocontrollersOverview of ST7 8-bit Microcontrollers
Overview of ST7 8-bit Microcontrollers
 
Introducing OMAP-L138/AM1808 Processor Architecture and Hawkboard Peripherals
Introducing OMAP-L138/AM1808 Processor Architecture and Hawkboard PeripheralsIntroducing OMAP-L138/AM1808 Processor Architecture and Hawkboard Peripherals
Introducing OMAP-L138/AM1808 Processor Architecture and Hawkboard Peripherals
 
Cyclone II FPGA Overview
Cyclone II FPGA OverviewCyclone II FPGA Overview
Cyclone II FPGA Overview
 
Project report
Project reportProject report
Project report
 
SoM with Zynq UltraScale device
SoM with Zynq UltraScale deviceSoM with Zynq UltraScale device
SoM with Zynq UltraScale device
 
Smart logic
Smart logicSmart logic
Smart logic
 
AT89 S52
AT89 S52AT89 S52
AT89 S52
 
SUN主机产品介绍.ppt
SUN主机产品介绍.pptSUN主机产品介绍.ppt
SUN主机产品介绍.ppt
 
Hpe Proliant DL20 Gen10 Server Datasheet
Hpe Proliant DL20 Gen10 Server DatasheetHpe Proliant DL20 Gen10 Server Datasheet
Hpe Proliant DL20 Gen10 Server Datasheet
 
Intel core i5
Intel core i5Intel core i5
Intel core i5
 

Plus de inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...inside-BigData.com
 

Plus de inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
 

Dernier

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 

Dernier (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Phytium 64 core cpu preview

  • 1. Mars: A 64-core ARMv8 Processor Charles Zhang Phytium Technology Co., Ltd
  • 2. Phytium Technology Co., Ltd Statements The following slides are presented to introduce the general features of one of our products, instead of any commitment about it. It is for information purposes only, and may not be incorporated into any contract. It is not suggested to make purchasing decisions accordingly. The development, release, and timing of any features or functionality described here remains at the sole discretion of Phytium. 2
  • 3. Phytium Technology Co., Ltd A Brief Introduction of Phytium  China corporation, founded in 2012  Guangzhou  Tianjin  Vision  Leading edge CPU and ASIC provider in China  Market focuses on chips for  Internet & Cloud Computing infrastructure  Traditional workload mainframe servers 3
  • 4. Phytium Technology Co., Ltd China is a Fast-growing Server Market 4 Company 1Q15 Revenue 1Q15 Market Share (%) 1Q14 Revenue 1Q14 Market Share (%) 1Q15-1Q14 Growth (%) HP 3,191,694,948 23.8 2,890,992,229 25.5 10.4 Dell 2,296,473,026 17.1 2,006,639,006 17.7 14.4 IBM 1,887,939,141 14.1 2,244,631,789 19.8 -15.9 Lenovo 970,254,659 7.2 127,973,470 1.1 658.2 Cisco 890,179,930 6.6 616,620,000 5.4 44.4 Others 4,157,871,704 31.0 3,469,383,444 30.6 19.8 Total 13,394,413,409 100.0 11,356,239,939 100.0 17.9 Company 1Q15 Revenue 1Q15 Market Share (%) 1Q14 Revenue 1Q14 Market Share (%) 1Q15-1Q14 Growth (%) Inspur 332,613,480 21 227,328,256 17 46 Dell 322,063,140 20 246,281,271 19 31 Lenovo 295,914,571 18 80,084,826 6 270 HP 217,487,450 14 167,775,923 13 30 Huawei 197,490,419 12 189,963,266 14 4 Sugon 140,377.091 9 70,705,366 5 99 Others 104,566,737 6 329,549,621 25 -68 Total 1,610,512,888 100.0 1,311,688,529 100.0 23 Source: Gartner (May 2015) China WW
  • 5. Phytium Technology Co., Ltd What is Mars for? 5 High performance High volume of memory High bandwidth memory access High bandwidth I/O access Large scale cache coherency maintained Moderate performance High power efficiency High density computing High bandwidth memory access Low cost Mars Earth
  • 6. Phytium Technology Co., Ltd Mars Overview  Architecture Features  64 Xiaomi cores, ARMv8 compatible  Hardware-maintained global cache coherency  Panel-based data affinity architecture  Mesh topology on chip network  32MB L2 cache  8 Cache & Memory Chips (CMC)  128MB L3 cache  16 DDR3-1600 channels  Two 16-lane PCIE3.0 i/f  ECC and parity protection on all caches, tags and TLBs 6 Physical • ~180M instances • 2.0GHz@28nm • 120W Performance • Peak:512GFLOPS • Mem BW:204GB/s • I/O BW: 32GB/s panel0 panel1 panel3 panel2 panel4 panel5 panel7 panel6 CMC PCIe PCIe DDR3 DDR3 CMC DDR3 DDR3 CMC DDR3 DDR3 CMC DDR3 DDR3 CMC DDR3 DDR3 CMC DDR3 DDR3 CMC DDR3 DDR3 CMC DDR3 DDR3
  • 7. Phytium Technology Co., Ltd Panel Architecture  Eight Xiaomi Cores  Compatible design with ARMv8 arch license  Both AArch32 and AArch64 modes  EL0~EL3 supported  ASIMD-128 supported  Adv. hybrid Branch Prediction  4 fetch/4 decode/4 dispatch Out-of-Order superscalar pipeline  Cache Hierarchy  Separated L1 ICache and L1 Dcache  Shared L2 cache, totally 4MB  Directory-based cache coherency maintenance  Directory Control Unit (DCU)  Routing Cell 7 Xiaomi Xiaomi Xiaomi Xiaomi L2cache Routing Cell DCU DCU Xiaomi Xiaomi Xiaomi Xiaomi L2cache 6000μm 10600μm
  • 8. Phytium Technology Co., Ltd8 Xiaomi Core ITLB I CacheBTB DirPre IndPre SRS Loop Detect Instruction Buffer decoderdecoderdecoderdecoder Rename Logic Arch. Reg file Phy. Reg file Dispatch Logic Reorder Buffer SCInt/VT Queue FP/VT Queue LD/ST Queue ALU /BR FMA C/FDI V FMA C/FDI V DTLB D Cache L2 Cache STB & Prefetch Prefetch Debug /Trace /Interrupt /Timer ALU /BR MUL /DIV MUL /DIV MCI/VT Queue
  • 9. Phytium Technology Co., Ltd ITLB I CacheBTB DirPre IndPre SRS Loop Detect Instruction Buffer decoderdecoderdecoderdecoder Rename Logic Arch. Reg file Phy. Reg file Dispatch Logic Reorder Buffer SCInt/VT Queue FP/VT Queue LD/ST Queue ALU /BR FMA C/FDI V FMA C/FDI V DTLB D Cache L2 Cache STB & Prefetch Prefetch Debug /Trace /Interrupt /Timer ALU /BR MUL /DIV MUL /DIV MCI/VT Queue 9 Xiaomi Core Front End ITLB I CacheBTB DirPre IndPre SRS Loop Detect Instruction Buffer Prefetch • 32KB L1 instr. Cache • Next line prefetch • Hybrid Branch Predictor • 2048-entry BTB • Direction predict with TAGE predictor • 512-entry indirect predictor • 48-entry Speculative Return Stack • Four instructions fetched per cycle • 32-entry instruction buffer • Loop detect and Instr. Cache bypass
  • 10. Phytium Technology Co., Ltd ITLB I CacheBTB DirPre IndPre SRS Loop Detect Instruction Buffer decoderdecoderdecoderdecoder Rename Logic Arch. Reg file Phy. Reg file Dispatch Logic Reorder Buffer SCInt/VT Queue FP/VT Queue LD/ST Queue ALU /BR FMA C/FDI V FMA C/FDI V DTLB D Cache L2 Cache STB & Prefetch Prefetch Debug /Trace /Interrupt /Timer ALU /BR MUL /DIV MUL /DIV MCI/VT Queue 10 Xiaomi Core Decode, Rename & Dispatch • Up to four instructions decoded per cycle • 192 physical registers • Up to four instructions renamed per cycle decoderdecoderdecoderdecoder Rename Logic Arch. Reg file Phy. Reg file Dispatch Logic Reorder Buffer • Up to four instructions dispatched per cycle • Reorder buffer can hold 160 instructions, and about 210+ instructions can be in-flight in the whole pipeline. • Dispatch in-order, execution out-of-order, retirement in-order.
  • 11. Phytium Technology Co., Ltd ITLB I CacheBTB DirPre IndPre SRS Loop Detect Instruction Buffer decoderdecoderdecoderdecoder Rename Logic Arch. Reg file Phy. Reg file Dispatch Logic Reorder Buffer SCInt/VT Queue FP/VT Queue LD/ST Queue ALU /BR FMA C/FDI V FMA C/FDI V DTLB D Cache L2 Cache STB & Prefetch Prefetch Debug /Trace /Interrupt /Timer ALU /BR MUL /DIV MUL /DIV MCI/VT Queue 11 Xiaomi Core Function Units • Two separated 16-entry integer and ASIMD queues shared by four integer units • Two integer unit can process single-cycle integer instructions and integer SIMD instructions, one can also process branch instructions. • Two integer units can process multi-cycle integer instructions and integer SIMD instructions. • One shared16-entry floating point and ASIMD queue • Two FP/ASIMD units equipped, which can be combined into one lockstep ASIMD unit. • FMA supported in both units. • FMUL: 3cycles, FADD: 3cycles, FMA: 6cycles SCInt/VT Queue FP/VT Queue ALU /BR FMA C/FDI V FMA C/FDI V ALU /BR MUL /DIV MUL /DIV MCI/VT Queue
  • 12. Phytium Technology Co., Ltd ITLB I CacheBTB DirPre IndPre SRS Loop Detect Instruction Buffer decoderdecoderdecoderdecoder Rename Logic Arch. Reg file Phy. Reg file Dispatch Logic Reorder Buffer SCInt/VT Queue FP/VT Queue LD/ST Queue ALU /BR FMA C/FDI V FMA C/FDI V DTLB D Cache L2 Cache STB & Prefetch Prefetch Debug /Trace /Interrupt /Timer ALU /BR MUL /DIV MUL /DIV MCI/VT Queue 12 Xiaomi Core Function Units • One 24-entry load/store queue • 32KB L1 data cache • 6 outstanding loads • 4 cycles latency from load to use • Next line and stride detected data prefetch • Streamlined pattern auto detected LD/ST Queue DTLB D Cache STB & Prefetch
  • 13. Phytium Technology Co., Ltd Cache coherence protocol  Hawk cache coherence protocol  Distributed directory-based global cache coherency  MOESI-like packet-based coherence protocol  A home node DCU(directory control unit) supports  Affinitive pairing of L2Cs and CMCs  “Infinite” capacity for non-conflicting Reads & Writes  Optimized transaction flow for exclusive atomic accesses  Reduced latency by cacheline forwarding 13 L2C L2C L2C Hawk L3C & Memory I/O Interconnects Global Exclusive Monitor Core0 Core7 Coherence Logic PanelN MEM Core0 Core7 Coherence Logic Panel0 Local Monitor
  • 14. Phytium Technology Co., Ltd Network on Chip  2D Concentrated Mesh Architecture  Cell based switch with 6 bidirectional ports  Uniform package format for each port, a port can be configured to be connected with a device or cascade cell  4 physical channels for CC and 1 channel for debug, DOR Y-X routing  Low latency: 3 cycles for each hop  High bandwidth: 384GB/s each cell Cell 1 5 3 0 2 4 L2cache L2cache MIU/IOU MIU or Cascade Dest. Lat. (cycles) 0 3 1 6 2 9 3 12 4 15 5 12 6 9 7 6 Avg. 9 3 4 Cell0 1 5 3 0 2 4 Cell1 0 2 4 1 5 3 Cell4 5 1 3 2 0 4 Cell5 2 0 4 5 1 3 Cell7 5 1 3 2 0 4 Cell2 0 2 4 1 5 3Cell3 1 5 3 0 2 4 Cell6 2 0 4 5 1 3 master 0 1 2 56 7 14
  • 15. Phytium Technology Co., Ltd Cache & Memory Chip  L3 cache  16MB Data Array  2MB Data ECC  DDR bandwidth  2 x DDR3-800:25.6GB/s  Proprietary interface between Mars & CMC  Parallel interface  Needs more pins, but lower latency than serdes  Separate write/cmd and read data channel L3 Bank0 Mars Interface L3 Bank1 L3 Bank2 L3 Bank3 Mem Ctrl0 Mem Ctrl1 D D R D D R 15  Effective read channel bandwidth:12.8GB/s  Effective write/cmd channel bandwidth:6.4GB/s
  • 16. Phytium Technology Co., Ltd Latency of affinitive access Memory access latency(ns) Local L1 cache hit ~2 Local L2 cache hit ~8 Affinitive L2 cache hit ~20 Affinitive L3 cache hit ~36 Affinitive DDR access ~70 • Panel : 2.0GHz • NoC: 2.0GHz • CMC: 1.5GHz * PCB latency not considered 16 Xiaomi Xiaomi Xiaomi Xiaomi L2cache Routing Cell DCU DCU Xiaomi Xiaomi Xiaomi Xiaomi L2cache Xiaomi Xiaomi Xiaomi Xiaomi L2cache Routing Cell DCU DCU Xiaomi Xiaomi Xiaomi Xiaomi L2cache CMC
  • 17. Phytium Technology Co., Ltd Memory Tune (mTune)  Rich Data Collection  Number of cache hits/misses for L1/L2/L3  Workload of cache pipelines  Busyness of the NoC  ECC corrections of the memory system  Support Multiple Metrics  Average Miss rate/Hit rate  Minimal/Maximal/Average Access Latency  Bandwidth Analysis  Concurrent Average Memory Access Time (CAMAT)  Support MPI/OpenMP Applications  Thread behavior analysis  Inter-process behavior analysis 17
  • 18. Phytium Technology Co., Ltd Scalable Debug System  ARMv8 CoreSight Compatible debug system  Scalable dedicated debug network across 64 cores  Distributed debug components  Configurable events broadcast scope  Timestamp broadcasts with single signal to simplify implementation 18 Cell0 DNB_C DNB_C 1 5 3 0 DNB_M DNB_I 2 4 Cell14 3 1 0 5 2 DNB_C DNB_C DNB_M Cell4 DNB_C DNB_C 5 1 3 2 0 4 DNB_M Cell54 3 5 2 1 0 DNB_C DNB_CDNB_M Cell33 4 0 1 2 5 DNB_C DNB_C DNB_M Cell24 3 1 0 5 2 DNB_C DNB_C DNB_M Cell7 DNB_C DNB_C 5 1 3 2 0 4 DNB_M Cell64 3 5 2 1 0 DNB_C DNB_CDNB_M panel0 panel1 panel4 panel5 panel3 panel2 panel7 panel6 hdbg JTAG1 JTAG2
  • 19. Phytium Technology Co., Ltd Physical Design  28nm process  0.9v core/1.8v IO  10 metal layers  ~180M instances  2.0GHz  120W  640mm2 die size  FCBGA  ~3000 pins 19 25.38mm 25.2mm
  • 20. Phytium Technology Co., Ltd Performance Evaluation  SpecCPU2006 20 Single copy of SPEC CPU benchmark 64 copies of SPEC CPU benchmark 19.2 17.8 0 5 10 15 20 25 INT FP SPEC_CPU2006_base 672 585 0 100 200 300 400 500 600 700 800 INT FP SPEC_CPU2006_rate
  • 21. Phytium Technology Co., Ltd Performance Evaluation  STREAM 21 0 10 20 30 40 50 60 70 80 90 1 2 4 8 16 24 32 40 48 56 64 STREAM triad #cores Bandwidth(GB/s)
  • 22. Phytium Technology Co., Ltd Next Generation Scale-up CPU  More powerful core  Aggressive Branch Predictor  Multithreading  More aggressive ILP exploitation  Wider SIMD  More RAS features  Higher bandwidth memory access  Higher power efficiency 22
  • 23. Mars: A 64-core ARMv8 Processor Charles Zhang Charles.zhang@phytium.com.cn