Exploring the Future Potential of AI-Enabled Smartphone Processors
SmartCore System for Dependable Many-core Processor with Multifunction Routers (in ICNC'10 Hiroshima)
1. SmartCore system
for Dependable Many-core Processor
with Multifunction Routers
Shinya Takamaeda†, Shimpei Sato†,
Takefumi Miyoshi‡, Kenji Kise†
†Tokyo Institute of Technology, Japan
‡The University of Electro-communications, Japan
10-11-18 ICNC’10 @Hiroshima
Regular Paper
Hardware Design and Implementation
14:50-15:20
5. Inter-connection for Many-core processors
NoC (Network on Chip)
Data transmission via on-chip-routers
10-11-18 ICNC'105
R
PE
R
PE
R
PE
R
PE
R
PE
R
PE
R
PE
R
PE
R
PE
R
PE
R
PE
R
PE
R
PE
R
PE
R
PE
R
PE
6. Low Dependability on Many-core
Process technology scaling for more transistors
But it increases …
Soft errors (e.g. bit inversion)
• since cosmic radiations
Timing errors
• since variations in transistor characteristic or wire delay
10-11-18 ICNC'10
How to create
a reliable Many-core processor?
6
7. Circuit
Micro-architecture
Architecture
Software
Assurance of the reliability on each layer
10-11-18 ICNC'107
Razor-FF
Lock-step
Check-pointing / Re-execution
Inter-connection
SmartCore system
Canary-FF
ECC in DRAM Memory
Architectural Core Salvaging
Slip Stream Processor
9. We propose the SmartCore system
SmartCore system
= Smart many-core system
with redundant cores and multifunction routers
Key: NoC-based DMR
To detect a error,
compare the output
packets from the pair
On-chip router has
3 special functions
• Copy a packet
• Change the destination
• Wait and Compare
2 packets
10-11-18 ICNC'10
PE
R
PE
R
PE
R
PE
R
PE
R
PE
R
Handling the
same packets
by packet
coping
Running the same
thread (DMR)
Running the
single thread
(DMR)
sharing a
packet /
comparing 2
packets
9
10. Base many-core architecture: M-Core [1]
2D mesh network
connects Nodes
Each Node memory
is independent
Inter-Node
communication
DMA via packets
using ID
A packet is a series of
flits (Flow Control Unit)
• Only the head flit of
a packet contains
the destination
10-11-18 ICNC'10
Node (2, 1)
INCC
Node
memory
Core
Comp.
Node
(1,1)
Comp.
Node
(1,2)
Comp.
Node
(1,8)
Comp.
Node
(2,1)
Comp.
Node
(2,2)
Comp.
Node
(2,8)
Comp.
Node
(3,1)
Comp.
Node
(3,2)
Comp.
Node
(3,8)
Comp.
Node
(8,1)
Comp.
Node
(8,1)
Comp.
Node
(8,8)
Operation
Node
(0,0)
Memory
Node
(1,0)
Off chip memory modules and switch
Conventional I/O
Many-core
processor chip
Memory
Node
(2,0)
Memory
Node
(3,0)
Memory
Node
(8,0)
Node (1, 1)
INCC
Node
memory
Core
Router Router
10
11. DMR on two nodes by using SmartCore
Executing a same program binary on the pair
Master Node and Mirror Node
If generated packets are different, they are faulty
Packet coping on the Router of the Master
for the Mirror to use the same data as Master
Packet comparison on the Router of Master
If these two differ, then the Router detects a error
10-11-18 ICNC'1011
PE
R
PE
R
PE
R
PE
R
PE
R
PE
R
PE
R
PE
R
Master Node Mirror Node
Node (1,1) Node (2,1) Node (3,1) Node (4,1)
Node (1,2) Node (2,2) Node (3,2) Node (4,2)
Logically Node (1,1)
12. 1. Coping a packet to the Mirror Node
Router on Master Node copies a coming packet
to the Mirror Node
The destination is changed to the Mirror Node’s ID
Original program has a several DMA communications
To certainly continue executing the same program in
the two Node
10-11-18 ICNC'10
INCCINCC
R R
Master Mirror
P P
12
13. 2. Wait for a packet from the Mirror Node
3. Compare the contents of two packets
Router on Master Node waits a packet from
Master Node and a packet from Mirror Node
When Router on Master receives the head flits
from both Nodes, then it starts to compare the 2
flits in order
If the contents of flits differ, a error exists in either
Master Node or Mirror Node
10-11-18 ICNC'10
INCCINCC
R R
Master MirrorP P
13
14. Base router
5 inputs with input buffers / 5 outputs
X-Y Dimension-order routing
Wormhole switching, Xon/Xoff flow control
1hop/1cycle, single cycle, no virtual channels
10-11-18 ICNC'10
Router
XBAR
Switch
Output port X+
Output port X-
Output port Y+
Output port Y-
Output port
DMAC
Input port X+
Input port X-
Input port Y+
Input port Y-
Input port
DMAC
Arbiter
14
15. Additional buffer for coping for Mirror Node (a)
ID translator to change the destination (b)
Flit comparator to verify (c)
Node type, Master/Mirror Node ID
Configured by system software
Multifunction router for SmartCore system
10-11-18 ICNC'10
Output port INCCInput port INCC
Router
XBAR
Switch
Output port X+
Output port X-
Output port Y+
Output port Y-
Input port X+
Input port X-
Input port Y+
Input port Y-
Arbiter
node type master / mirror ID
V
Verify
ID translation
(a)(b)
(c)
15
16. Advantages of SmartCore system
Adaptable to any kind of hardware modules
generating a packet
ex) Cache, DSP, Processor core
Because of …
Error detection mechanism is independent to
Node structure
• Core-granularity redundant execution /
Packet level error detection
10-11-18 ICNC'1016
18. Preliminary Evaluation of SmartCore system
2 evaluations
Performance overhead on DMR
Packet rendezvous time
Environment: SimMc 1.0
64 (8×8) threads on 128 (16×8) Nodes
Core
• MIPS32 single issue / single cycle processor
Router
• 1 hop / 1 cycle, no virtual channels, flit size: 4 bytes
INCC (Network Interface)
• up to 1 flit / cycle receive/send from/to router
Benchmark: 4 apps from NAS Parallel Benchmarks
• cg, ft, is, lu, Size: S
10-11-18 ICNC'1018
Node (X, Y)
INCC
Node
memory
Core
Router
19. 3 configurations of thread mapping
10-11-18 ICNC'1019
1,1
1,2
1,8
2,1
2,2
2,8
8,1
8,2
8,8
8 Nodes 8 Nodes
8Nodes
1,1
1,2
1,8
2,1
2,2
2,8
8,1
8,2
8,8
8Nodes
16 Nodes
1,1
1,2
1,8
1,1
1,2
1,8
2,1
2,2
2,8
2,1
2,2
2,8
8,1
8,2
8,8
8,1
8,2
8,8
8Nodes
16 Nodes
(a) Base Allocation
(b) Redundant space allocation (Area 2x) (c) Redundant execution with SmartCore system
x,y
Proper thread
(Master Node)
Redundant thread
(Mirror Node)
x,y
Not working
to see the effect on #hops to see the effect on SmartCore
20. Evaluation: Performance overhead on DMR
A little slow down
Redundant space (Area 2x): up to 1% slow down
Redundant execution (SmartCore): up to 4% slow
down (in cg of NPB)
10-11-18 ICNC'1020
21. Evaluation: Packet rendezvous time
Cumulative distribution of # cycles that the router
on Master Node waits for a packet from Mirror
Node
Almost communications with a little rendezvous
10-11-18 ICNC'10
cg
ft
is lu
21
23. Hardware Implementation on FPGAs
Dependable Many-core processor on FPGA-
based prototyping system
by using ScalableCore system [8]
• Connected
FPGA boards
• Variable
# FPGA boards
2 execution mode
• Normal Mode
– Standard M-Core
• SmartCore Mode
– The pair executes
same thread
10-11-18 ICNC'1023
SD
Loader
(0,1)
Physical
ID
(1,1)
Path
(0,2)
Physical
ID
(1,2)
Physical
ID
(2,1)
Physical
ID
(2,2)
Physical
ID
(3,1)
Physical
ID
(3,2)
Physical
ID
(4,1)
Physical
ID
(4,2)
Path
(0,3)
Physical
ID
(1,3)
Physical
ID
(2,3)
Physical
ID
(3,3)
Physical
ID
(4,3)
LogicalID (1,1)
LogicalID (1,2)
LogicalID (1,3)
LogicalID (2,1)
LogicalID (2,2)
LogicalID (2,3)
Power
Master Mirror Master Mirror
24. Overview of 15 Nodes ScalableCore system
with SmartCore system
10-11-18 ICNC'1024
Logical ID (1,1)
Master Mirror
Logical ID (1,2)
Master Mirror
Logical ID (1,3)
Master Mirror
Logical ID (2,1)
Master Mirror
Logical ID (2,2)
Master Mirror
Logical ID (2,3)
Master Mirror
Program
Loader
ID (0,1)
SmartCore system detects a artificial fault
28. Conclusion
We propose the SmartCore system
NoC-based DMR by using multifunction routers
Multifunction router has 3 special functions
• Coping a packet
• Changing the destination of a packet
• Waiting and comparing the contents of two packets
Low performance overhead
Hardware implementation on FPGA-based
prototyping system
Future works
Recovery after error detections
TMR by SmartCore system
10-11-18 ICNC'1028