SlideShare une entreprise Scribd logo
1  sur  57
DFX Architecture for High
Performance Multi-core Processors




           Ishwar Parulkar
        Sun Microsystems, Inc.
Outline

•  Processor Overview
•  DFX Challenges/Opportunities in 3rd Gen CMT
•  Scan flop and chain configurations
•  Embedded Memories
•  Deterministic Functional Test
•  System Test and Debug
•  Enhancing Yield and RAS
•  Conclusions
                                                 2
Outline

•  Processor Overview
•  DFX Challenges/Opportunities in 3rd Gen CMT
•  Scan flop and chain configurations
•  Embedded Memories
•  Deterministic Functional Test
•  System Test and Debug
•  Enhancing Yield and RAS
•  Conclusions
                                                 3
Processor Overview




                     4
Processor Die Photograph




                           5
Outline

•  Processor Overview
•  DFX Challenges/Opportunities in 3rd Gen CMT
•  Scan flop and chain configurations
•  Embedded Memories
•  Deterministic Functional Test
•  System Test and Debug
•  Enhancing Yield and RAS
•  Conclusions
                                                 6
Characteristics of 3rd Gen CMT
Processors
•  16 or more complex, multi-threaded cores
    – Scout threads
    – Execute ahead
    – Simultaneous speculative threading
    – Transactional memory
    – Parallelization of programs
    – Near-linear scalability for multiple sockets
•  High bandwidth in and out of chip => Serdes
•  Chip configurations with subset of cores
                                                     7
DFX Challenges in 3rd Gen CMT
Processors

•  Amplification of DFX cost because of high
   degree of replication
  –  global versus local trade-offs
•  Testing of complex structures
  –  3-D register files; multi-ported memories
•  Testing large-scale implementation of SerDes
•  Deterministic behavior on ATE and in system
   in presence of non-deterministic SerDes

                                                  8
DFX Opportunities in 3rd Gen CMT
Processors

•  Yield enhancement
   –  Binning on throughput performance
•  On-line Availability
   –  Detection and isolation of defective cores and/
      or thread hardware
•  Rapid design of derivative chip family
   –  Minimal DFX design, verification and test
      pattern generation effort

                                                        9
Outline

•  Processor Overview
•  DFX Challenges/Opportunities in 3rd Gen CMT
•  Scan flop and chain configurations
•  Embedded Memories
•  Deterministic Functional Test
•  System Test and Debug
•  Enhancing Yield and RAS
•  Conclusions
                                                 10
Scan Flop
                                                         Q

          func clk   func clk                 func clk


D

                                scan_in clk


                                              scan_in clk



scan_in                                           scan_out




                                                             11
Choice of Scan Flop

•  Scan path and operation impervious to variation
  –  scan circuits are uniform across flop types - dynamic
     front-ends; pulse clocked
  –  scan path is static and robust across process
     variation
•  Can be extended (by adding one more latch) to
   observe flop state dynamically
•  Consumes less dynamic power because of reduced
   load on functional clock


                                                         12
Scan Chain Architecture
•  Requirements
   How do
  yo
  u
   manage 1.35 million scan flops in a CMT design?
•  Considerations in architecting scan chains
  –  Efficient identification of partial good cores
  –  Partial core chip configurations
  –  Handling of special flops in non-ATPG scenarios
     (e.
     g. redundancy registers, clock control, Logic BIST, etc.)‫‏‬
  –  Efficiency of scan patterns on ATE
  –  IDS probe loop time for debug
  –  Efficiency of scan-dump in system debug
  –  Usability of scan in presence of scan bugs                 13
Scan Chain Architecture




                CC Level
                 Scan
              Configuration



                      m scan chains



   chip                           chip
scan inputs                   scan outputs



                                             14
Outline

•  Processor Overview
•  DFX Challenges/Opportunities in 3rd Gen CMT
•  Scan flop and chain configurations
•  Embedded Memories
•  Deterministic Functional Test
•  System Test and Debug
•  Enhancing Yield and RAS
•  Conclusions
                                                 15
Embedded Memory Test - Challenges
•  Scale
   –  >1100 instances of embedded memories
•  Variation of size and type
   –  2MB L2-cache to 8-32 entry queues
•  Complex, specialized arrays
   –  3-D register files; CAM-RAM combinations; multi-
      ported memories
•  Sun/TI specific test requirements
   –  Direct pin access to large memories
   –  Efficiency and ease of bit-mapping
                                                         16
At-speed Test of Memories via Scan


                          W Address   W Word Lines
              1
              0
                          W Enable
                          R Address
                                      R Word Lines

                          R Enable

                                         Memory
                                          Clock




                                                     W Data


                                                              R Data
                            Clock
                           Header
       Functional Clock
        Scan_In Clock



                                                                       17
SPARC ASI Network
•  Access network on chip corresponding to Address
   Space Identifier (ASI) in SPARC memory model
•  Uses of ASI accesses
  –  Normal operation
          – Chip configuration by software
          – Transfer of information programmed in E-fuse
            farm to internal registers
  –  Failures in field
          – Diagnosis of failures
          – Reconfiguration of chip
  –  Engineering Bring-up
          – Error injection for post-silicon validation of RAS
          – Observability during debug                           18
SPARC ASI Network Implementation
                                            CORE
                                           Pipeline




  Service
                  System
                                 Switch
                                             ASI
                Management
   Port
        Control Unit
              Network
•  ASI network is hierarchical
    –  Star and daisy chain
•  ASI routing hubs in units
    –  Packets                                          WRITE DATA
       routed                                          ADDRESS
       b                                                MEMORY
       ased on destination array ID                    R/W CONTROL
•  Dedicated or shared ASI paths                        READ DATA

    –  Muxing could
       be
                                                                     19
       a
       nywhere in the path to array‫‏‬
Memory Test Network
                                               CORE
                                              Pipeline




                                                ASI
                                    Switch
                                              Network



                      System
      Service
       Port
                    Management
                            WRITE DATA
                    Control Unit
                                                          ADDRESS
                                                           MEMORY
ASI - Address Space Identifier                            R/W CONTROL
                                                           READ DATA


                                                                        20
Memory Test Network
    IEEE 1149.1
       TAP
                                              CORE
                                             Pipeline
                                   MTCU


                                               ASI
                                   Switch
                                             Network



                     System
     Service
      Port
                   Management
                            WRITE DATA
                   Control Unit
                                                         ADDRESS
                                                          MEMORY
ASI - Address Space Identifier                           R/W CONTROL
MTCU - Memory Test Control Unit                           READ DATA


                                                                       21
Memory Test Network
    IEEE 1149.1
       TAP
                                              CORE
                                             Pipeline
      DMTA
                        MTCU
       Port


                                               ASI
                                   Switch
                                             Network



                     System
     Service
      Port
                   Management
                            WRITE DATA
                   Control Unit
                                                         ADDRESS
                                                          MEMORY
ASI - Address Space Identifier                           R/W CONTROL
MTCU - Memory Test Control Unit                           READ DATA
DMTA - Direct Memory Test Access (Slow Speed)‫‏‬
                                                                       22
Memory Test Network
    IEEE 1149.1
       TAP
                                               CORE
                                              Pipeline
      DMTA
                         MTCU
       Port



      DMO
         Space/Time
                  ASI
                                    Switch
      Port
        Multiplexer
               Network



                      System
     Service
      Port
                    Management
                            WRITE DATA
                    Control Unit
                                                          ADDRESS
                                                           MEMORY
ASI - Address Space Identifier                            R/W CONTROL
MTCU - Memory Test Control Unit                            READ DATA
DMTA - Direct Memory Test Access (Slow Speed)‫‏‬
DMO - Direct Memory Observe (High Speed)
                                                                        23
Memory Test Network
•  DFX requirements
   impose
   d
    on ASI network (architectural and implementation)‫‏‬
  –  Loads and stores on consecutive clock cycles
  –  Order of transactions maintained during transit
  –  Direct access to memory via network
  –  Error checking logic disabled (parity, ECC)‫‏‬
  –  Data word replication for wide memories
  –  Broadcast mode (for initialization)‫‏‬
  –  Network integrity mode (for diagnosis)‫‏‬
                                                         24
Central MBIST Programmability
•  Parameters of Memory under Test
  –  ASI ID of Memory
  –  Routing information (core/unit ID)‫‏‬
  –  ASI data bits to be masked
  –  Size of address space
  –  R/W cycle access time of memory
•  Address permutation programmability
  –  MBIST engine has incrementor/decrementor
  –  Program bit
     position
     of ASI address bit for MBIST sequencer before test
•  Debug and bit-mapping support                          25
3-D Register File
  •  Stores multiple copies of architectural state
     for speculation and threading
     –  a static portion optimized for area
     –  an active portion optimized for speed




                                                     26
3-D Register File (Schematic)‫‏‬




                                 27
MBIST Algorithm for 3-D Memories
•  Static Portion: Only Write Ports
•  Active Portion: Write and Read Ports
•  RESTORE Function: Transfers contents from
   Static to Active Portion
•  MBIST Algorithm
   –  First, test Active array like a typical SRAM
   –  For Static array
      •  in place of READ of Static array, do a RESTORE
         followed by READ of Active array in next cycle
      •  align accesses to maintain back-to-back
       cycle accesses of March tests
                                                          28
MBIST Algorithm for 3-D Memories
Clock Cycles   0      1        2    3       4    5    6
Accesses       R0     W1       R1   R0     W1    R1   R0

Address Seq        Address X            Address X+1




                                                           29
MBIST Algorithm for 3-D Memories
 Clock Cycles     0      1        2      3       4       5    6
 Accesses         R0     W1       R1     R0     W1      R1    R0

Address Seq           Address X              Address X+1

 Clock Cycles     0      1        2      3       4       5    6
   Static             Address X              Address X+1
Address Seq
Static Accesses   ®      W1       ®      ®      W1      ®     ®
Active Accesses          R0       _      R1      R0      _     R1
   Active                    Address X               Address X+1
 Address Seq

®   = Restore                                                       30
Outline

•  Processor Overview
•  DFX Challenges/Opportunities in 3rd Gen CMT
•  Scan flop and chain configurations
•  Embedded Memories
•  Deterministic Functional Test
•  System Test and Debug
•  Enhancing Yield and RAS
•  Conclusions
                                                 31
Determinism for Functional Test
The Problem required for
•  Functional Test
  –  Speed binning
  –  Timing path debug on ATE
  –  Repeatability for logic debug in system
  –
    E
    mulate system behavior on ATE for correlation
•  Sources of indeterminism
  –  Indeterminism in Rx (SerDes receivers)‫‏‬
  –  Indeterminism in Tx (SerDes transmitters)‫‏‬
  –  Asynchronous clock domain crossings
•  Ca
   c                                                32


   he-resident functional test is a partial solution
Processor Clock Domains

                  IO Logical        Serdes
      Main           Laye       Physical Layer
       Cor             r        Clock Domain
        e        Clock Domain     (1.33Ghz)‫‏‬
  Clock Domain    (1.33Ghz)‫‏‬
    (2.3Ghz)‫‏‬
                                         Tx1
                                         TxN


                                         Rx1
                                         RxN


                                                 33
Indeterminism on Tx path




                 0 1 2 3 4 5 6 7 8
             2



                                     R
            W




                                         34
Indeterminism on Tx path




                 0 1 2 3 4 5 6 7 8
             2



                                     R
            W




                                         35
Indeterminism on Tx path

                    0 1 2 3 4 5 6 7 8




            W                           R


                    0 1 2 3 4 5 6 7 8
                2



                                            R
            W




                                                36
Indeterminism on Tx path

                    0 1 2 3 4 5 6 7 8




            W                           R


                    0 1 2 3 4 5 6 7 8
                2



                                            R
            W




                                                37
Deterministic Tx path

                    0 1 2 3 4 5 6 7 8




            W                           R


                    0 1 2 3 4 5 6 7 8
                2



                                            R
             W
                          =



                                                38
Indeterminism on Rx path




         8   76 5 4 3 2 1 0




                              W
     R




                                  39
Indeterminism on Rx path

          8   76 5 4 3 2 1 0




                                   W
      R


          8   76 5 4 3 2 1 0




                               W
     R




                                       40
Indeterminism on Rx path

          8   76 5 4 3 2 1 0




                                   W
      R


          8   76 5 4 3 2 1 0




                               W
     R




                                       41
Indeterminism on Rx path

          8   76 5 4 3 2 1 0




                                   W
      R


          8   76 5 4 3 2 1 0




                               W
     R




                                       42
Indeterminism on Rx path

          8   76 5 4 3 2 1 0




                                   W
      R


          8   76 5 4 3 2 1 0




                               W
     R




                                       43
Deterministic Rx path

          8   76 5 4 3 2 1 0




                                   W
      R


          8   76 5 4 3 2 1 0




                               W
      R




                                       44
Deterministic Rx path

          8   76 5 4 3 2 1 0




                                   W
      R


          8   76 5 4 3 2 1 0




                               W
      R




                                       45
Deterministic Rx path

          8   76 5 4 3 2 1 0




                                    W
      R


          8   76 5 4 3 2 1 0




                                W
      R
                                             READ_DELAY


                                                                               Rx timeline
                                     YES
                                             Rx enables                  Rx starts
                Rx enables      Aligned?                  Rx detects
                                             Sync byte                 incrementing
               byte alignment                             Sync byte
                                              detection                 write pointer
                                        NO
                                   Jog                                                       46
                                 by 1-bit
Deterministic Rx path

          8   76 5 4 3 2 1 0




                                    W
      R


          8   76 5 4 3 2 1 0




                                W
      R
                                             READ_DELAY


                                                                               Rx timeline
                                     YES
                                             Rx enables                  Rx starts
                Rx enables      Aligned?                  Rx detects
                                             Sync byte                 incrementing
               byte alignment                             Sync byte
                                              detection                 write pointer
                                        NO
                                   Jog                                                       47
                                 by 1-bit
Deterministic Functional Test Mode

                      IO Logical              Serdes
    Main                 Laye             Physical Layer
     Cor                   r              Clock Domain
      e              Clock Domain           (1.33Ghz)‫‏‬
Clock Domain          (1.33Ghz)‫‏‬
  (2.3Ghz)‫‏‬
                                                   Tx1        Ratioed (1:1)
                                                              Synchronous
                                                           Fixed Phase in Half
                                                   TxN      Data Rate Mode


                                                             Ratioed (1:1)
                                                   Rx1        CDR Output
                                                             Synchronous
                                                   RxN     De-skew Alignment

                                Ratioed (1:1)
            Ratioed 2:1         Synchronous
           Synchronous       Fixed Phase in Half
          Pointer Passing     Data Rate Mode
                                                                                 48
Outline

•  Processor Overview
•  DFX Challenges/Opportunities in 3rd Gen CMT
•  Scan flop and chain configurations
•  Embedded Memories
•  Deterministic Functional Test
•  System Test and Debug
•  Enhancing Yield and RAS
•  Conclusions
                                                 49
System Test/Debug
•  ServiceLink – Serial System
   Management
   In
   terface with Service Processor (SP) as master
•  Logic BIST in addition to scan ATPG
•  Memory BIST
  –  Default configuration available via ServiceLink
•  Interconnect BIST
  – All loopback modes and programmable knobs (phase,
     amplitude,
     CDR sampling, etc.) accessible via ServiceLink
  –  Ability to plot eye diagrams in system
                                                        50
•  BIST included in Power-on Self-test (POST)‫‏‬
• 
 U
 s
Use of DFX Features in System
 e of DFX features in enterprise class systems?
  – Productization/Engineering
        •
           E
           arly electrical validation of system infrastructure
        •  Correlation of
           m
           e
           asurements in ATE versus system environments
     – Manufacturing
        •  High
           qual
           ity test of components in embedded environment
     – In Field                                                  51

        •  Efficient POST
        •  Reduction of field NTF (No Trouble Found)‫‏‬
Outline

•  Processor Overview
•  DFX Challenges/Opportunities in 3rd Gen CMT
•  Scan flop and chain configurations
•  Embedded Memories
•  Deterministic Functional Test
•  System Test and Debug
•  Enhancing Yield and RAS
•  Conclusions
                                                 52
Enhancing Product Yield
•  Size of core cluster (4 cores) = 58mm2 = 15% of
   die
  –  Defects in 30% of chip, yield chips with approx.
     ½ of max throughput
  –  Small memories below repair criteria add up to a
     large number of bits
•  DFX features identify partial die configurations
•  Information programmed into E-fuse farm
   during manufacturing
•  Clocks to defective cores disabled and
   SolarisTM disallows scheduling threads
                                                      53
Enhancing RAS

•  Logic BIST, Memory BIST and Interconnect
   BIST run in the field
•  Fault Management module in SolarisTM
   isolates and reconfigures
  –  cores, cache ways, cache lines
•  Hypervisor can dynamically move workloads
   from a core
•  Significant improvement in Availability (up-
   time) and Mean Time Between Unplanned
   System Interruptions (crashes)

                                                  54
Outline

•  Processor Overview
•  DFX Challenges/Opportunities in 3rd Gen CMT
•  Scan flop and chain configurations
•  Embedded Memories
•  Deterministic Functional Test
•  System Test and Debug
•  Enhancing Yield and RAS
•  Conclusions
                                                 55
Conclusions

•  Highly re-configurable scan chain architecture
   to manage > 1 million flops in CMT designs

•  Balance between a central MBIST engine to
   cover most arrays and a few dedicated engines
   for specialized arrays

•  Determinism for functional test/debug will
   become more challenging at > 10Gbps – need
   more observability on chip
                                                    56
Conclusions (contd.)
•  Ability to sort partially defective chips critical to
   maximizing yield in CMT products

•  Defect isolation at thread resolution essential
   for acceptable uptimes in systems with CMT
   chips

•  Modularity and reconfigurability of DFX features
   enables faster design and productization of
   derivative CMT chips
                                                       57

Contenu connexe

Similaire à DFX Architecture for High-performance Multi-core Microprocessors

Oracle rac 10g best practices
Oracle rac 10g best practicesOracle rac 10g best practices
Oracle rac 10g best practices
Haseeb Alam
 
Hari Krishna Vetsa Resume
Hari Krishna Vetsa ResumeHari Krishna Vetsa Resume
Hari Krishna Vetsa Resume
Hari Krishna
 
OpenSPARC T1 Processor
OpenSPARC T1 ProcessorOpenSPARC T1 Processor
OpenSPARC T1 Processor
DVClub
 
Verification Strategy for PCI-Express
Verification Strategy for PCI-ExpressVerification Strategy for PCI-Express
Verification Strategy for PCI-Express
DVClub
 
VDI storage and storage virtualization
VDI storage and storage virtualizationVDI storage and storage virtualization
VDI storage and storage virtualization
Sisimon Soman
 

Similaire à DFX Architecture for High-performance Multi-core Microprocessors (20)

How to Minimize Cost and Risk for Developing Safety-Certifiable Systems
How to Minimize Cost and Risk for Developing Safety-Certifiable SystemsHow to Minimize Cost and Risk for Developing Safety-Certifiable Systems
How to Minimize Cost and Risk for Developing Safety-Certifiable Systems
 
How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?
 
How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?
 
How to create innovative architecture using ViualSim?
How to create innovative architecture using ViualSim?How to create innovative architecture using ViualSim?
How to create innovative architecture using ViualSim?
 
Oracle rac 10g best practices
Oracle rac 10g best practicesOracle rac 10g best practices
Oracle rac 10g best practices
 
Webinar on RISC-V
Webinar on RISC-VWebinar on RISC-V
Webinar on RISC-V
 
Processors selection
Processors selectionProcessors selection
Processors selection
 
Exploration of Radars and Software Defined Radios using VisualSim
Exploration of  Radars and Software Defined Radios using VisualSimExploration of  Radars and Software Defined Radios using VisualSim
Exploration of Radars and Software Defined Radios using VisualSim
 
Mirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP LibraryMirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP Library
 
Oow 2008 yahoo_pie-db
Oow 2008 yahoo_pie-dbOow 2008 yahoo_pie-db
Oow 2008 yahoo_pie-db
 
Project Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxProject Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptx
 
Hari Krishna Vetsa Resume
Hari Krishna Vetsa ResumeHari Krishna Vetsa Resume
Hari Krishna Vetsa Resume
 
Thaker q3 2008
Thaker q3 2008Thaker q3 2008
Thaker q3 2008
 
OpenSPARC T1 Processor
OpenSPARC T1 ProcessorOpenSPARC T1 Processor
OpenSPARC T1 Processor
 
Architectural tricks to maximize memory bandwidth
Architectural tricks to maximize memory bandwidthArchitectural tricks to maximize memory bandwidth
Architectural tricks to maximize memory bandwidth
 
Verification Strategy for PCI-Express
Verification Strategy for PCI-ExpressVerification Strategy for PCI-Express
Verification Strategy for PCI-Express
 
Intel® RDT Hands-on Lab
Intel® RDT Hands-on LabIntel® RDT Hands-on Lab
Intel® RDT Hands-on Lab
 
Thesis Presentation
Thesis PresentationThesis Presentation
Thesis Presentation
 
NFV Orchestration for Optimal Performance
NFV Orchestration for Optimal PerformanceNFV Orchestration for Optimal Performance
NFV Orchestration for Optimal Performance
 
VDI storage and storage virtualization
VDI storage and storage virtualizationVDI storage and storage virtualization
VDI storage and storage virtualization
 

DFX Architecture for High-performance Multi-core Microprocessors

  • 1. DFX Architecture for High Performance Multi-core Processors Ishwar Parulkar Sun Microsystems, Inc.
  • 2. Outline •  Processor Overview •  DFX Challenges/Opportunities in 3rd Gen CMT •  Scan flop and chain configurations •  Embedded Memories •  Deterministic Functional Test •  System Test and Debug •  Enhancing Yield and RAS •  Conclusions 2
  • 3. Outline •  Processor Overview •  DFX Challenges/Opportunities in 3rd Gen CMT •  Scan flop and chain configurations •  Embedded Memories •  Deterministic Functional Test •  System Test and Debug •  Enhancing Yield and RAS •  Conclusions 3
  • 6. Outline •  Processor Overview •  DFX Challenges/Opportunities in 3rd Gen CMT •  Scan flop and chain configurations •  Embedded Memories •  Deterministic Functional Test •  System Test and Debug •  Enhancing Yield and RAS •  Conclusions 6
  • 7. Characteristics of 3rd Gen CMT Processors •  16 or more complex, multi-threaded cores – Scout threads – Execute ahead – Simultaneous speculative threading – Transactional memory – Parallelization of programs – Near-linear scalability for multiple sockets •  High bandwidth in and out of chip => Serdes •  Chip configurations with subset of cores 7
  • 8. DFX Challenges in 3rd Gen CMT Processors •  Amplification of DFX cost because of high degree of replication –  global versus local trade-offs •  Testing of complex structures –  3-D register files; multi-ported memories •  Testing large-scale implementation of SerDes •  Deterministic behavior on ATE and in system in presence of non-deterministic SerDes 8
  • 9. DFX Opportunities in 3rd Gen CMT Processors •  Yield enhancement –  Binning on throughput performance •  On-line Availability –  Detection and isolation of defective cores and/ or thread hardware •  Rapid design of derivative chip family –  Minimal DFX design, verification and test pattern generation effort 9
  • 10. Outline •  Processor Overview •  DFX Challenges/Opportunities in 3rd Gen CMT •  Scan flop and chain configurations •  Embedded Memories •  Deterministic Functional Test •  System Test and Debug •  Enhancing Yield and RAS •  Conclusions 10
  • 11. Scan Flop Q func clk func clk func clk D scan_in clk scan_in clk scan_in scan_out 11
  • 12. Choice of Scan Flop •  Scan path and operation impervious to variation –  scan circuits are uniform across flop types - dynamic front-ends; pulse clocked –  scan path is static and robust across process variation •  Can be extended (by adding one more latch) to observe flop state dynamically •  Consumes less dynamic power because of reduced load on functional clock 12
  • 13. Scan Chain Architecture •  Requirements How do yo u manage 1.35 million scan flops in a CMT design? •  Considerations in architecting scan chains –  Efficient identification of partial good cores –  Partial core chip configurations –  Handling of special flops in non-ATPG scenarios (e. g. redundancy registers, clock control, Logic BIST, etc.)‫‏‬ –  Efficiency of scan patterns on ATE –  IDS probe loop time for debug –  Efficiency of scan-dump in system debug –  Usability of scan in presence of scan bugs 13
  • 14. Scan Chain Architecture CC Level Scan Configuration m scan chains chip chip scan inputs scan outputs 14
  • 15. Outline •  Processor Overview •  DFX Challenges/Opportunities in 3rd Gen CMT •  Scan flop and chain configurations •  Embedded Memories •  Deterministic Functional Test •  System Test and Debug •  Enhancing Yield and RAS •  Conclusions 15
  • 16. Embedded Memory Test - Challenges •  Scale –  >1100 instances of embedded memories •  Variation of size and type –  2MB L2-cache to 8-32 entry queues •  Complex, specialized arrays –  3-D register files; CAM-RAM combinations; multi- ported memories •  Sun/TI specific test requirements –  Direct pin access to large memories –  Efficiency and ease of bit-mapping 16
  • 17. At-speed Test of Memories via Scan W Address W Word Lines 1 0 W Enable R Address R Word Lines R Enable Memory Clock W Data R Data Clock Header Functional Clock Scan_In Clock 17
  • 18. SPARC ASI Network •  Access network on chip corresponding to Address Space Identifier (ASI) in SPARC memory model •  Uses of ASI accesses –  Normal operation – Chip configuration by software – Transfer of information programmed in E-fuse farm to internal registers –  Failures in field – Diagnosis of failures – Reconfiguration of chip –  Engineering Bring-up – Error injection for post-silicon validation of RAS – Observability during debug 18
  • 19. SPARC ASI Network Implementation CORE Pipeline Service System Switch ASI Management Port Control Unit Network •  ASI network is hierarchical –  Star and daisy chain •  ASI routing hubs in units –  Packets WRITE DATA routed ADDRESS b MEMORY ased on destination array ID R/W CONTROL •  Dedicated or shared ASI paths READ DATA –  Muxing could be 19 a nywhere in the path to array‫‏‬
  • 20. Memory Test Network CORE Pipeline ASI Switch Network System Service Port Management WRITE DATA Control Unit ADDRESS MEMORY ASI - Address Space Identifier R/W CONTROL READ DATA 20
  • 21. Memory Test Network IEEE 1149.1 TAP CORE Pipeline MTCU ASI Switch Network System Service Port Management WRITE DATA Control Unit ADDRESS MEMORY ASI - Address Space Identifier R/W CONTROL MTCU - Memory Test Control Unit READ DATA 21
  • 22. Memory Test Network IEEE 1149.1 TAP CORE Pipeline DMTA MTCU Port ASI Switch Network System Service Port Management WRITE DATA Control Unit ADDRESS MEMORY ASI - Address Space Identifier R/W CONTROL MTCU - Memory Test Control Unit READ DATA DMTA - Direct Memory Test Access (Slow Speed)‫‏‬ 22
  • 23. Memory Test Network IEEE 1149.1 TAP CORE Pipeline DMTA MTCU Port DMO Space/Time ASI Switch Port Multiplexer Network System Service Port Management WRITE DATA Control Unit ADDRESS MEMORY ASI - Address Space Identifier R/W CONTROL MTCU - Memory Test Control Unit READ DATA DMTA - Direct Memory Test Access (Slow Speed)‫‏‬ DMO - Direct Memory Observe (High Speed) 23
  • 24. Memory Test Network •  DFX requirements impose d on ASI network (architectural and implementation)‫‏‬ –  Loads and stores on consecutive clock cycles –  Order of transactions maintained during transit –  Direct access to memory via network –  Error checking logic disabled (parity, ECC)‫‏‬ –  Data word replication for wide memories –  Broadcast mode (for initialization)‫‏‬ –  Network integrity mode (for diagnosis)‫‏‬ 24
  • 25. Central MBIST Programmability •  Parameters of Memory under Test –  ASI ID of Memory –  Routing information (core/unit ID)‫‏‬ –  ASI data bits to be masked –  Size of address space –  R/W cycle access time of memory •  Address permutation programmability –  MBIST engine has incrementor/decrementor –  Program bit position of ASI address bit for MBIST sequencer before test •  Debug and bit-mapping support 25
  • 26. 3-D Register File •  Stores multiple copies of architectural state for speculation and threading –  a static portion optimized for area –  an active portion optimized for speed 26
  • 27. 3-D Register File (Schematic)‫‏‬ 27
  • 28. MBIST Algorithm for 3-D Memories •  Static Portion: Only Write Ports •  Active Portion: Write and Read Ports •  RESTORE Function: Transfers contents from Static to Active Portion •  MBIST Algorithm –  First, test Active array like a typical SRAM –  For Static array •  in place of READ of Static array, do a RESTORE followed by READ of Active array in next cycle •  align accesses to maintain back-to-back cycle accesses of March tests 28
  • 29. MBIST Algorithm for 3-D Memories Clock Cycles 0 1 2 3 4 5 6 Accesses R0 W1 R1 R0 W1 R1 R0 Address Seq Address X Address X+1 29
  • 30. MBIST Algorithm for 3-D Memories Clock Cycles 0 1 2 3 4 5 6 Accesses R0 W1 R1 R0 W1 R1 R0 Address Seq Address X Address X+1 Clock Cycles 0 1 2 3 4 5 6 Static Address X Address X+1 Address Seq Static Accesses ® W1 ® ® W1 ® ® Active Accesses R0 _ R1 R0 _ R1 Active Address X Address X+1 Address Seq ® = Restore 30
  • 31. Outline •  Processor Overview •  DFX Challenges/Opportunities in 3rd Gen CMT •  Scan flop and chain configurations •  Embedded Memories •  Deterministic Functional Test •  System Test and Debug •  Enhancing Yield and RAS •  Conclusions 31
  • 32. Determinism for Functional Test The Problem required for •  Functional Test –  Speed binning –  Timing path debug on ATE –  Repeatability for logic debug in system – E mulate system behavior on ATE for correlation •  Sources of indeterminism –  Indeterminism in Rx (SerDes receivers)‫‏‬ –  Indeterminism in Tx (SerDes transmitters)‫‏‬ –  Asynchronous clock domain crossings •  Ca c 32 he-resident functional test is a partial solution
  • 33. Processor Clock Domains IO Logical Serdes Main Laye Physical Layer Cor r Clock Domain e Clock Domain (1.33Ghz)‫‏‬ Clock Domain (1.33Ghz)‫‏‬ (2.3Ghz)‫‏‬ Tx1 TxN Rx1 RxN 33
  • 34. Indeterminism on Tx path 0 1 2 3 4 5 6 7 8 2 R W 34
  • 35. Indeterminism on Tx path 0 1 2 3 4 5 6 7 8 2 R W 35
  • 36. Indeterminism on Tx path 0 1 2 3 4 5 6 7 8 W R 0 1 2 3 4 5 6 7 8 2 R W 36
  • 37. Indeterminism on Tx path 0 1 2 3 4 5 6 7 8 W R 0 1 2 3 4 5 6 7 8 2 R W 37
  • 38. Deterministic Tx path 0 1 2 3 4 5 6 7 8 W R 0 1 2 3 4 5 6 7 8 2 R W = 38
  • 39. Indeterminism on Rx path 8 76 5 4 3 2 1 0 W R 39
  • 40. Indeterminism on Rx path 8 76 5 4 3 2 1 0 W R 8 76 5 4 3 2 1 0 W R 40
  • 41. Indeterminism on Rx path 8 76 5 4 3 2 1 0 W R 8 76 5 4 3 2 1 0 W R 41
  • 42. Indeterminism on Rx path 8 76 5 4 3 2 1 0 W R 8 76 5 4 3 2 1 0 W R 42
  • 43. Indeterminism on Rx path 8 76 5 4 3 2 1 0 W R 8 76 5 4 3 2 1 0 W R 43
  • 44. Deterministic Rx path 8 76 5 4 3 2 1 0 W R 8 76 5 4 3 2 1 0 W R 44
  • 45. Deterministic Rx path 8 76 5 4 3 2 1 0 W R 8 76 5 4 3 2 1 0 W R 45
  • 46. Deterministic Rx path 8 76 5 4 3 2 1 0 W R 8 76 5 4 3 2 1 0 W R READ_DELAY Rx timeline YES Rx enables Rx starts Rx enables Aligned? Rx detects Sync byte incrementing byte alignment Sync byte detection write pointer NO Jog 46 by 1-bit
  • 47. Deterministic Rx path 8 76 5 4 3 2 1 0 W R 8 76 5 4 3 2 1 0 W R READ_DELAY Rx timeline YES Rx enables Rx starts Rx enables Aligned? Rx detects Sync byte incrementing byte alignment Sync byte detection write pointer NO Jog 47 by 1-bit
  • 48. Deterministic Functional Test Mode IO Logical Serdes Main Laye Physical Layer Cor r Clock Domain e Clock Domain (1.33Ghz)‫‏‬ Clock Domain (1.33Ghz)‫‏‬ (2.3Ghz)‫‏‬ Tx1 Ratioed (1:1) Synchronous Fixed Phase in Half TxN Data Rate Mode Ratioed (1:1) Rx1 CDR Output Synchronous RxN De-skew Alignment Ratioed (1:1) Ratioed 2:1 Synchronous Synchronous Fixed Phase in Half Pointer Passing Data Rate Mode 48
  • 49. Outline •  Processor Overview •  DFX Challenges/Opportunities in 3rd Gen CMT •  Scan flop and chain configurations •  Embedded Memories •  Deterministic Functional Test •  System Test and Debug •  Enhancing Yield and RAS •  Conclusions 49
  • 50. System Test/Debug •  ServiceLink – Serial System Management In terface with Service Processor (SP) as master •  Logic BIST in addition to scan ATPG •  Memory BIST –  Default configuration available via ServiceLink •  Interconnect BIST – All loopback modes and programmable knobs (phase, amplitude, CDR sampling, etc.) accessible via ServiceLink –  Ability to plot eye diagrams in system 50 •  BIST included in Power-on Self-test (POST)‫‏‬
  • 51. •  U s Use of DFX Features in System e of DFX features in enterprise class systems? – Productization/Engineering • E arly electrical validation of system infrastructure •  Correlation of m e asurements in ATE versus system environments – Manufacturing •  High qual ity test of components in embedded environment – In Field 51 •  Efficient POST •  Reduction of field NTF (No Trouble Found)‫‏‬
  • 52. Outline •  Processor Overview •  DFX Challenges/Opportunities in 3rd Gen CMT •  Scan flop and chain configurations •  Embedded Memories •  Deterministic Functional Test •  System Test and Debug •  Enhancing Yield and RAS •  Conclusions 52
  • 53. Enhancing Product Yield •  Size of core cluster (4 cores) = 58mm2 = 15% of die –  Defects in 30% of chip, yield chips with approx. ½ of max throughput –  Small memories below repair criteria add up to a large number of bits •  DFX features identify partial die configurations •  Information programmed into E-fuse farm during manufacturing •  Clocks to defective cores disabled and SolarisTM disallows scheduling threads 53
  • 54. Enhancing RAS •  Logic BIST, Memory BIST and Interconnect BIST run in the field •  Fault Management module in SolarisTM isolates and reconfigures –  cores, cache ways, cache lines •  Hypervisor can dynamically move workloads from a core •  Significant improvement in Availability (up- time) and Mean Time Between Unplanned System Interruptions (crashes) 54
  • 55. Outline •  Processor Overview •  DFX Challenges/Opportunities in 3rd Gen CMT •  Scan flop and chain configurations •  Embedded Memories •  Deterministic Functional Test •  System Test and Debug •  Enhancing Yield and RAS •  Conclusions 55
  • 56. Conclusions •  Highly re-configurable scan chain architecture to manage > 1 million flops in CMT designs •  Balance between a central MBIST engine to cover most arrays and a few dedicated engines for specialized arrays •  Determinism for functional test/debug will become more challenging at > 10Gbps – need more observability on chip 56
  • 57. Conclusions (contd.) •  Ability to sort partially defective chips critical to maximizing yield in CMT products •  Defect isolation at thread resolution essential for acceptable uptimes in systems with CMT chips •  Modularity and reconfigurability of DFX features enables faster design and productization of derivative CMT chips 57