This presentation was given at ITC 2008 (International Test Conference). It deals with DFX challenges and solution for high count multi-core microprocessors. Acknowledgment: Co-authors on ITC presentation - Gaurav Agarwal, Sriram Anandakumar, Gordon Liu, Rajesh Pendurkar, Krishna Rajan and Frank Chiu.
DFX Architecture for High-performance Multi-core Microprocessors
1. DFX Architecture for High
Performance Multi-core Processors
Ishwar Parulkar
Sun Microsystems, Inc.
2. Outline
• Processor Overview
• DFX Challenges/Opportunities in 3rd Gen CMT
• Scan flop and chain configurations
• Embedded Memories
• Deterministic Functional Test
• System Test and Debug
• Enhancing Yield and RAS
• Conclusions
2
3. Outline
• Processor Overview
• DFX Challenges/Opportunities in 3rd Gen CMT
• Scan flop and chain configurations
• Embedded Memories
• Deterministic Functional Test
• System Test and Debug
• Enhancing Yield and RAS
• Conclusions
3
6. Outline
• Processor Overview
• DFX Challenges/Opportunities in 3rd Gen CMT
• Scan flop and chain configurations
• Embedded Memories
• Deterministic Functional Test
• System Test and Debug
• Enhancing Yield and RAS
• Conclusions
6
7. Characteristics of 3rd Gen CMT
Processors
• 16 or more complex, multi-threaded cores
– Scout threads
– Execute ahead
– Simultaneous speculative threading
– Transactional memory
– Parallelization of programs
– Near-linear scalability for multiple sockets
• High bandwidth in and out of chip => Serdes
• Chip configurations with subset of cores
7
8. DFX Challenges in 3rd Gen CMT
Processors
• Amplification of DFX cost because of high
degree of replication
– global versus local trade-offs
• Testing of complex structures
– 3-D register files; multi-ported memories
• Testing large-scale implementation of SerDes
• Deterministic behavior on ATE and in system
in presence of non-deterministic SerDes
8
9. DFX Opportunities in 3rd Gen CMT
Processors
• Yield enhancement
– Binning on throughput performance
• On-line Availability
– Detection and isolation of defective cores and/
or thread hardware
• Rapid design of derivative chip family
– Minimal DFX design, verification and test
pattern generation effort
9
10. Outline
• Processor Overview
• DFX Challenges/Opportunities in 3rd Gen CMT
• Scan flop and chain configurations
• Embedded Memories
• Deterministic Functional Test
• System Test and Debug
• Enhancing Yield and RAS
• Conclusions
10
12. Choice of Scan Flop
• Scan path and operation impervious to variation
– scan circuits are uniform across flop types - dynamic
front-ends; pulse clocked
– scan path is static and robust across process
variation
• Can be extended (by adding one more latch) to
observe flop state dynamically
• Consumes less dynamic power because of reduced
load on functional clock
12
13. Scan Chain Architecture
• Requirements
How do
yo
u
manage 1.35 million scan flops in a CMT design?
• Considerations in architecting scan chains
– Efficient identification of partial good cores
– Partial core chip configurations
– Handling of special flops in non-ATPG scenarios
(e.
g. redundancy registers, clock control, Logic BIST, etc.)
– Efficiency of scan patterns on ATE
– IDS probe loop time for debug
– Efficiency of scan-dump in system debug
– Usability of scan in presence of scan bugs 13
14. Scan Chain Architecture
CC Level
Scan
Configuration
m scan chains
chip chip
scan inputs scan outputs
14
15. Outline
• Processor Overview
• DFX Challenges/Opportunities in 3rd Gen CMT
• Scan flop and chain configurations
• Embedded Memories
• Deterministic Functional Test
• System Test and Debug
• Enhancing Yield and RAS
• Conclusions
15
16. Embedded Memory Test - Challenges
• Scale
– >1100 instances of embedded memories
• Variation of size and type
– 2MB L2-cache to 8-32 entry queues
• Complex, specialized arrays
– 3-D register files; CAM-RAM combinations; multi-
ported memories
• Sun/TI specific test requirements
– Direct pin access to large memories
– Efficiency and ease of bit-mapping
16
17. At-speed Test of Memories via Scan
W Address W Word Lines
1
0
W Enable
R Address
R Word Lines
R Enable
Memory
Clock
W Data
R Data
Clock
Header
Functional Clock
Scan_In Clock
17
18. SPARC ASI Network
• Access network on chip corresponding to Address
Space Identifier (ASI) in SPARC memory model
• Uses of ASI accesses
– Normal operation
– Chip configuration by software
– Transfer of information programmed in E-fuse
farm to internal registers
– Failures in field
– Diagnosis of failures
– Reconfiguration of chip
– Engineering Bring-up
– Error injection for post-silicon validation of RAS
– Observability during debug 18
19. SPARC ASI Network Implementation
CORE
Pipeline
Service
System
Switch
ASI
Management
Port
Control Unit
Network
• ASI network is hierarchical
– Star and daisy chain
• ASI routing hubs in units
– Packets WRITE DATA
routed ADDRESS
b MEMORY
ased on destination array ID R/W CONTROL
• Dedicated or shared ASI paths READ DATA
– Muxing could
be
19
a
nywhere in the path to array
20. Memory Test Network
CORE
Pipeline
ASI
Switch
Network
System
Service
Port
Management
WRITE DATA
Control Unit
ADDRESS
MEMORY
ASI - Address Space Identifier R/W CONTROL
READ DATA
20
21. Memory Test Network
IEEE 1149.1
TAP
CORE
Pipeline
MTCU
ASI
Switch
Network
System
Service
Port
Management
WRITE DATA
Control Unit
ADDRESS
MEMORY
ASI - Address Space Identifier R/W CONTROL
MTCU - Memory Test Control Unit READ DATA
21
22. Memory Test Network
IEEE 1149.1
TAP
CORE
Pipeline
DMTA
MTCU
Port
ASI
Switch
Network
System
Service
Port
Management
WRITE DATA
Control Unit
ADDRESS
MEMORY
ASI - Address Space Identifier R/W CONTROL
MTCU - Memory Test Control Unit READ DATA
DMTA - Direct Memory Test Access (Slow Speed)
22
23. Memory Test Network
IEEE 1149.1
TAP
CORE
Pipeline
DMTA
MTCU
Port
DMO
Space/Time
ASI
Switch
Port
Multiplexer
Network
System
Service
Port
Management
WRITE DATA
Control Unit
ADDRESS
MEMORY
ASI - Address Space Identifier R/W CONTROL
MTCU - Memory Test Control Unit READ DATA
DMTA - Direct Memory Test Access (Slow Speed)
DMO - Direct Memory Observe (High Speed)
23
24. Memory Test Network
• DFX requirements
impose
d
on ASI network (architectural and implementation)
– Loads and stores on consecutive clock cycles
– Order of transactions maintained during transit
– Direct access to memory via network
– Error checking logic disabled (parity, ECC)
– Data word replication for wide memories
– Broadcast mode (for initialization)
– Network integrity mode (for diagnosis)
24
25. Central MBIST Programmability
• Parameters of Memory under Test
– ASI ID of Memory
– Routing information (core/unit ID)
– ASI data bits to be masked
– Size of address space
– R/W cycle access time of memory
• Address permutation programmability
– MBIST engine has incrementor/decrementor
– Program bit
position
of ASI address bit for MBIST sequencer before test
• Debug and bit-mapping support 25
26. 3-D Register File
• Stores multiple copies of architectural state
for speculation and threading
– a static portion optimized for area
– an active portion optimized for speed
26
28. MBIST Algorithm for 3-D Memories
• Static Portion: Only Write Ports
• Active Portion: Write and Read Ports
• RESTORE Function: Transfers contents from
Static to Active Portion
• MBIST Algorithm
– First, test Active array like a typical SRAM
– For Static array
• in place of READ of Static array, do a RESTORE
followed by READ of Active array in next cycle
• align accesses to maintain back-to-back
cycle accesses of March tests
28
31. Outline
• Processor Overview
• DFX Challenges/Opportunities in 3rd Gen CMT
• Scan flop and chain configurations
• Embedded Memories
• Deterministic Functional Test
• System Test and Debug
• Enhancing Yield and RAS
• Conclusions
31
32. Determinism for Functional Test
The Problem required for
• Functional Test
– Speed binning
– Timing path debug on ATE
– Repeatability for logic debug in system
–
E
mulate system behavior on ATE for correlation
• Sources of indeterminism
– Indeterminism in Rx (SerDes receivers)
– Indeterminism in Tx (SerDes transmitters)
– Asynchronous clock domain crossings
• Ca
c 32
he-resident functional test is a partial solution
33. Processor Clock Domains
IO Logical Serdes
Main Laye Physical Layer
Cor r Clock Domain
e Clock Domain (1.33Ghz)
Clock Domain (1.33Ghz)
(2.3Ghz)
Tx1
TxN
Rx1
RxN
33
46. Deterministic Rx path
8 76 5 4 3 2 1 0
W
R
8 76 5 4 3 2 1 0
W
R
READ_DELAY
Rx timeline
YES
Rx enables Rx starts
Rx enables Aligned? Rx detects
Sync byte incrementing
byte alignment Sync byte
detection write pointer
NO
Jog 46
by 1-bit
47. Deterministic Rx path
8 76 5 4 3 2 1 0
W
R
8 76 5 4 3 2 1 0
W
R
READ_DELAY
Rx timeline
YES
Rx enables Rx starts
Rx enables Aligned? Rx detects
Sync byte incrementing
byte alignment Sync byte
detection write pointer
NO
Jog 47
by 1-bit
48. Deterministic Functional Test Mode
IO Logical Serdes
Main Laye Physical Layer
Cor r Clock Domain
e Clock Domain (1.33Ghz)
Clock Domain (1.33Ghz)
(2.3Ghz)
Tx1 Ratioed (1:1)
Synchronous
Fixed Phase in Half
TxN Data Rate Mode
Ratioed (1:1)
Rx1 CDR Output
Synchronous
RxN De-skew Alignment
Ratioed (1:1)
Ratioed 2:1 Synchronous
Synchronous Fixed Phase in Half
Pointer Passing Data Rate Mode
48
49. Outline
• Processor Overview
• DFX Challenges/Opportunities in 3rd Gen CMT
• Scan flop and chain configurations
• Embedded Memories
• Deterministic Functional Test
• System Test and Debug
• Enhancing Yield and RAS
• Conclusions
49
50. System Test/Debug
• ServiceLink – Serial System
Management
In
terface with Service Processor (SP) as master
• Logic BIST in addition to scan ATPG
• Memory BIST
– Default configuration available via ServiceLink
• Interconnect BIST
– All loopback modes and programmable knobs (phase,
amplitude,
CDR sampling, etc.) accessible via ServiceLink
– Ability to plot eye diagrams in system
50
• BIST included in Power-on Self-test (POST)
51. •
U
s
Use of DFX Features in System
e of DFX features in enterprise class systems?
– Productization/Engineering
•
E
arly electrical validation of system infrastructure
• Correlation of
m
e
asurements in ATE versus system environments
– Manufacturing
• High
qual
ity test of components in embedded environment
– In Field 51
• Efficient POST
• Reduction of field NTF (No Trouble Found)
52. Outline
• Processor Overview
• DFX Challenges/Opportunities in 3rd Gen CMT
• Scan flop and chain configurations
• Embedded Memories
• Deterministic Functional Test
• System Test and Debug
• Enhancing Yield and RAS
• Conclusions
52
53. Enhancing Product Yield
• Size of core cluster (4 cores) = 58mm2 = 15% of
die
– Defects in 30% of chip, yield chips with approx.
½ of max throughput
– Small memories below repair criteria add up to a
large number of bits
• DFX features identify partial die configurations
• Information programmed into E-fuse farm
during manufacturing
• Clocks to defective cores disabled and
SolarisTM disallows scheduling threads
53
54. Enhancing RAS
• Logic BIST, Memory BIST and Interconnect
BIST run in the field
• Fault Management module in SolarisTM
isolates and reconfigures
– cores, cache ways, cache lines
• Hypervisor can dynamically move workloads
from a core
• Significant improvement in Availability (up-
time) and Mean Time Between Unplanned
System Interruptions (crashes)
54
55. Outline
• Processor Overview
• DFX Challenges/Opportunities in 3rd Gen CMT
• Scan flop and chain configurations
• Embedded Memories
• Deterministic Functional Test
• System Test and Debug
• Enhancing Yield and RAS
• Conclusions
55
56. Conclusions
• Highly re-configurable scan chain architecture
to manage > 1 million flops in CMT designs
• Balance between a central MBIST engine to
cover most arrays and a few dedicated engines
for specialized arrays
• Determinism for functional test/debug will
become more challenging at > 10Gbps – need
more observability on chip
56
57. Conclusions (contd.)
• Ability to sort partially defective chips critical to
maximizing yield in CMT products
• Defect isolation at thread resolution essential
for acceptable uptimes in systems with CMT
chips
• Modularity and reconfigurability of DFX features
enables faster design and productization of
derivative CMT chips
57