SlideShare une entreprise Scribd logo
1  sur  155
Télécharger pour lire hors ligne
Intel® Core™ Microarchitecture


           Intel® Software College
Intel® Software College


Objectives


After completion of this module you will be able to describe
• Components of an IA processor
• Working flow of the instruction pipeline
• Notable features of the architecture




                                                           Intel® Processor Micro-architecture - Core® microarchitecture

       2
      Copyright © 2006, Intel Corporation. All rights reserved.
   Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


Agenda


Introduction
Knowledge preparation
Notable features
Micro-architecture tour
Coding considerations




                                                           Intel® Processor Micro-architecture - Core® microarchitecture

       3
      Copyright © 2006, Intel Corporation. All rights reserved.
   Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


Agenda


Introduction
Knowledge preparation
Notable features
Micro-architecture tour
Coding considerations




                                                           Intel® Processor Micro-architecture - Core® microarchitecture

       4
      Copyright © 2006, Intel Corporation. All rights reserved.
   Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Industrial Recognition                                                                                                                                                            Intel® Software College




PC Format May 2006
“Intel Strikes Back! Conroe is the name. Pistol-whipping Athlon
64s into burger meat is the game..“

                                          Intel's Next Generation Microarchitecture Unveiled
                                          Real World Tech
                                          “Just as important as the technical innovations in Core MPUs, this
                                          microarchitecture will have a profound impact on the industry. “

                    Intel Dishes the Knockout Punch to AMD with Conroe, GD Hardware.com
                    “…the results were far more than we could hope for and it'll be
                    amusing to see AMD's response to this beat-down session

Intel Regains Performance Crown, Anandtech
“… At 2.8 or 3.0GHz, a Conroe EE would offer even stronger performance
than what we’ve seen here.”


                                                        Intel Reveals Conroe Architecture, Extremetech
                                                        “… And not only was the Intel system running at 2.66GHz— a slower
                                                        clock rate than the top Pentium 4—it was outpacing an overclocked
                                                        Athlon 64 FX-60. Wrap your brain around that idea for a bit…”

 Conroe Benchmarks - Intel Showing Big Strength Hot Hardware.com
                            Intel® Processor Micro-architecture - Core® microarchitecture
 “… Intel is poised to change the face of the desktop computing landscape…”
             5
            Copyright © 2006, Intel Corporation. All rights reserved.
         Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


 Performance Summary

 Intel® Core™ Microarchitecture dramatically boosts Intel
 platform performance
 • Conroe & Woodcrest drive clear Desktop/Server performance
   leadership
 • Merom extends Intel Mobile performance leadership

 Intel® Core™ Microarchitecture-based platforms set the
 bar in Performance and Energy Efficiency for the Multi-
 Core era
 • Intel’s 3rd generation dual-core (while competition stuck on 1st
   generation)
 • New Intel high-performance ‘engine’: Wider, Smarter, Faster, More
   Efficient

                       Best Processor on the Planet: Energy-Efficient Performance 1
                                                     Energy-
  The “Core™ Effect”: Intel® Core™ Microarchitecture
20% (Merom), broad roadmap accelerationsPerformance Boosts1 !
  ramp fuels 40% (Conroe), 80% (Woodcrest)
                                                            Intel® Processor Micro-architecture - Core® microarchitecture

        6          1   Based on SPECint*_rate_base2000
       Copyright © 2006, Intel Corporation. All rights reserved.
    Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


Agenda


Introduction
Knowledge preparation
• Architecture VS Microarchitecture
• CISC VS RISC
• Performance Measurements
• Pipeline Design
• Power and Energy
• Chip Multi-Processing
Notable features
Micro-architecture tour
Coding considerations

                                                           Intel® Processor Micro-architecture - Core® microarchitecture

       7
      Copyright © 2006, Intel Corporation. All rights reserved.
   Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College

         Architecture and Micro-architecture

What is Computer Architecture?
• Architecture is the set of features which are externally visible:
  •   Instruction set
  •   Registers
  •   Addressing modes
  •   Bus protocols
Intel Architectures (IA)
• IA32/X86 (8-bit, 16-bit and 32-bit Integer architecture)
  •   X87 (Floating Point extension)
  •   MMX (Multi-Media extension)
  •   SSE, SSE2, SSE3 (SIMD Streaming Extension)
• Intel® 64/EM64T (64-bit Integer extension of IA32)                                                                                                                    ? Go to detail!
• IA64 (Intel new 64-bit architecture)
  •   Itanium/Itainium2 processor family

                                                              Intel® Processor Micro-architecture - Core® microarchitecture

          8
         Copyright © 2006, Intel Corporation. All rights reserved.
      Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


Architecture and Micro-architecture (cont.)


What is Micro-architecture?
• Same as m–Architecture or u-Architecture
• “Invisible” features that provide meaningful value to the end
  user (whatever makes you buy a new compatible PC)
 • Programs run faster    Improved Performance
 • Reduced Power consumption      Extended Battery life
 • H/W fits into Smaller Form Factor




                                                           Intel® Processor Micro-architecture - Core® microarchitecture

       9
      Copyright © 2006, Intel Corporation. All rights reserved.
   Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


      Intel® Architecture History
                                                                * IXA – Intel Internet Exchange Architecture/                               EPIC – Explicitly Parallel Instruction Computing
                                                 Examples:
Architecture:
Instruction set definition                               EPIC* (Itanium®)                                                    IA-32                                                     IXA* (XScale)
and compatibility

Microarchitecture:
Hardware implementation       Examples:
maintaining instruction set
compatibility with high-level     P5                                                                        P6                          Intel NetBurst®                                     Banias
architecture


Processors:
Productized
implementation of
Microarchitecture                                Examples:
                                                                                                                                             Pentium® 4
                                                                                               Pentium® Pro
                                                        Pentium®                                                                             Pentium® D                                Pentium® M
                                                                                               Pentium® II/III
                                                                                                                                               Xeon®




                                                                 Intel® Processor Micro-architecture - Core® microarchitecture

             10
            Copyright © 2006, Intel Corporation. All rights reserved.
         Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


Intel® Core™ Microarchitecture Processors



 Intel® NetBurst®




+ New Innovations




     Mobile
Microarchitecture


                                                                        Intel® Core™ 2 Duo/Quad/Extreme processors
                                                           Intel® Processor Micro-architecture - Core® microarchitecture

       11
      Copyright © 2006, Intel Corporation. All rights reserved.
   Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


RISC Approach to CPU design
  (RISC = Reduced Instruction Set Computers)
     Optimize H/W for common basic operations
     • Fixed instruction length
           •        Shorter Execution Pipeline
           •        Ease of Instruction Level Parallelism
     • Large number of registers
           •        Less memory accesses
     • ‘Load/Store’ architecture
           •        Shorter Execution Pipeline
           •        Ease of advancing Loads
     • Branch Hints
           •        Reduce pipeline flush events
     • ‘Exotic’ stuff to be implemented in S/W with minimal H/W support
           •        No ‘complex’ H/W instructions
           •        Handle exceptional conditions in S/W
     Examples: MIPS, IBM Power and PowerPC, Sun Sparc

               Achieve Maximum performance by
               right partitioning between H/W and S/W     Intel® Processor Micro-architecture - Core® microarchitecture

      12
     Copyright © 2006, Intel Corporation. All rights reserved.
  Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


CISC Approach to CPU design

     (CISC = Complex Instruction Set Computers)
  Rich architecture
  • Variable length instructions.
  • Complex addressing modes.
  On-chip HW / SW partitioning required
  • H/W keeps executing ‘simple’ stuff
  • Complex instructions are ‘emulated’ using u-code routines
    from ROM
  • More instructions treated as ‘simple’ as more H/W is available
  COMPATIBILITY has some major advantages:
  • Large (and forever increasing) software base
  • Code development tools
  • Expertise
  • H/W - S/W spiral
  Example: Intel IA32, Motorola 680X0

           Maximize information passed to the HW
                                                          Intel® Processor Micro-architecture - Core® microarchitecture

      13
     Copyright © 2006, Intel Corporation. All rights reserved.
  Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


Performance Measurement
 Performance is the reciprocal of the “Time of execution”:

                               1                1
       Performance ≈                      =
 Were:               Time _ of _ Execution L * CPI * TC
 L   = Code Length (# of machine instructions)
 CPI = Clock cycles Per Instruction
 Tc = Clock period (nSecs)

 Substitute:
 IPC = Instructions Per Cycle = 1/CPI
 F   = Frequency = 1/Tc


                                           Improve ILP                                     Improve Timing

                                 IPC * F
                   Performance ≈
                                   L
                                                                                                  Arch Enhancements

                                                          Intel® Processor Micro-architecture - Core® microarchitecture

      14
     Copyright © 2006, Intel Corporation. All rights reserved.
  Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


Performance Measurement (cont.)

                                                                                            Benchmarks examples
Performance considerations:                                                                 • Industry Standard
• Which Code/Application to run?                                                                  •        Spec (ISPEC, FSPEC)
• Which OS?                                                                                       •        TPC
                                                                                            •     Commercial
• Which other components in the                                                                   •        SysMark
  platform?                                                                                       •        MobileMark
• Under which thermal conditions?                                                                 •        PCMark
• Multithreading? Multiprocessing?                                                                •        Sandra
                                                                                                  •        ScienceMark
                                                                                            •     Applications
                                                                                                  •        Video (Windows Media encoder, DivX)
                                                                                                  •        Audio (Lame MP3)
                                                                                                  •        Compression (RAR)
                                                                                                  •        Content creation (3DSM, Photoshop, Premiere)
                                                                                                  •        Latest Games (Doom III, FarCry, but changes
                                                                                                           fast)
                                                                                            •     Specific industries use specific benchmarks
                                                                                                  •        Linux compilation, POVRay, LinPack, lmbench




                                                            Intel® Processor Micro-architecture - Core® microarchitecture

        15
       Copyright © 2006, Intel Corporation. All rights reserved.
    Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College



Design Considerations for Different
Market Segments
Constrains:
• Thermally, area constrained                      Desktop
• Unconstrained                                    Extreme
• Very area constrained                            Value
• Thermally, Energy and Area constrained           Mobile
• Thermally, Energy                         Servers
Micro-architecture is the Art of Tradeoffs between:
• Schedule
• Requirements / Standards
• Performance
• Features
• Power / Energy
• Area / Cost

                                                          Intel® Processor Micro-architecture - Core® microarchitecture

      16
     Copyright © 2006, Intel Corporation. All rights reserved.
  Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


Design Metrics


IPC = Instructions per Cycle
• The more the better
Latency – same as Response Time
• The time interval between
  •      when any request for data is made and
  •      when the data transfer completes
• The less the better
Throughput
• The amount of work completed by the system per unit of time.
• The more the better
• ops/sec


                                                              Intel® Processor Micro-architecture - Core® microarchitecture

          17
         Copyright © 2006, Intel Corporation. All rights reserved.
      Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


CPU Pipeline


Break the work to smaller pieces
• Four basic stages of instruction life
  •      Fetch - bring instruction to core
  •      Decode - read operands from register
  •      Execute - perform the operation
  •      Writeback - save result to register
• Execution timing of simple instructions
  (legend: “op src1,src2                                                      dst”)
  add eax, ebx     eax                                                            F                               D                       E                       W
  sub ecx, edx     ecx                                                                                            F                       D                       E                      W
Increased throughput
• increased number of completed instructions per cycle


                                                              Intel® Processor Micro-architecture - Core® microarchitecture

          18
         Copyright © 2006, Intel Corporation. All rights reserved.
      Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


Pipeline Design - Explore Parallelism
New instruction not always depends on previous one
•       Can start new instruction before previous one is finished
•       ...if different stages use different H/W resources
Run instructions in parallel (pipeline)
Add eax, ebx        eax           F      D    E    W
Sub ecx, edx        ecx                  F    D    E    W
Or edi, esi       edi                              F    D    E    W
Need to balance pipe stages
•       Each stage should take same time for best throughput and utilization

                                                                                                                          Clock cycle is determined
                                                                                                                          by the longest path!

                 Fetch                  Decode Exec  WB
                                        Fetch Decode Exec   WB
                                               Fetch Decode Exec  WB
                                                     Fetch Decode Exec                                                                                          WB
                                                           Intel® Processor Micro-architecture - Core® microarchitecture

       19
      Copyright © 2006, Intel Corporation. All rights reserved.
   Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


Pipeline Design – Fighting Stalls


Data flow dependency (instructions output/input)
• Solved by bypasses, renaming etc
Control flow dependencies
• Solved by branch prediction
Others (Cache misses, long latency instructions)
• Solved by other dynamic scheduling techniques

                                                                                                                                                                   ? Go to detail!




                                                           Intel® Processor Micro-architecture - Core® microarchitecture

       20
      Copyright © 2006, Intel Corporation. All rights reserved.
   Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


Race of CISC vs. RISC


In modern CPUs Advanced µ-Architecture Techniques minimize the
advantages of RISC over CISC
• Branch Prediction
 • Reduces the effect of extra pipeline stages
• Register Renaming
 • Effectively Increase the Number of Registers
• Out Of Order
 • Reduce Number of stalls caused by shortage of registers
• Speculative Execution
 • Further Reduce Number of stalls
• Power saving features
 • Reduce the overhead when not needed.


                                                           Intel® Processor Micro-architecture - Core® microarchitecture

       21
      Copyright © 2006, Intel Corporation. All rights reserved.
   Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


op – Intel’s Take of the CICS/RISC Race


(CISC) Instructions are translated into one or more (RISC)
uop(micro-operation)s
• Fixed format
• Wide and simple
• Temp registers
Usually one uop per instruction
Complex instruction can be thousands of uops
Stores divided into two uops (STA and STD)
Fusion play games here



                                                         Intel® Processor Micro-architecture - Core® microarchitecture

     22
    Copyright © 2006, Intel Corporation. All rights reserved.
 Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


Power and Energy

Maximum power (TDP):
•    Cooling requirements
•    Cooling solution
•    Computer form factor and acoustic noise
Average power
•    Battery life
•    Electricity bill
General calculation:
• P = frequency * voltage^2 * activity factor * capacitance + leakage
Reducing TDP
• Less transistors and wires
• Smaller transistors and wires
• Power features      less activity
• Low leakage transistors
Reducing average power
• Energy efficiency
• Power states
• Lower leakage



                                                           Intel® Processor Micro-architecture - Core® microarchitecture

       23
      Copyright © 2006, Intel Corporation. All rights reserved.
   Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College



Dual/Multi Core and SMT
 Put more than one core per package
 Architectural change:
    • Software must be multi-threaded or multi-process
    • …but backward compatible with multiprocessor systems (MP)
 Several ways of implementing it
    • All of them being used


                                                                                              I/O                                                               I/O
            I/O                   I/O
                                                                                                                                                                LLC
            LLC                   LLC                                            LLC                   LLC

           Core                  Core                                           Core                  Core                                          Core                  Core



 SMT: Run two (or more) threads on the same core, simultaneously
                                                          Intel® Processor Micro-architecture - Core® microarchitecture

      24
     Copyright © 2006, Intel Corporation. All rights reserved.
  Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College

Intel Approach


                                                                                                                                                                                             ?
                                                                                                                                             Intel®
                                                                                                                                             Intel®
                                                                                                                                            XQ6700*
                                                                                                    Intel®
                                                                                                    Intel®
                                                                                                 Core 2 Duo®
                                                                                                        Duo®
                                                               Intel®
                                                               Intel®
                                                            Pentium® D
                                                            Pentium®
                                                             Processor                                                                                                               80 Threads
                                 Intel®
                                 Intel®
                               Pentium®
                               Pentium®
                                With HT
     Intel®
     Intel®                                                                                                                             4 Threads
    Pentium®
    Pentium®
                                                                                                 2 Threads
                                                                                                                                                        State
                                                            2 Threads                                                                                   Execution Units
                                                                                                                                                        Cache
                                                                                                                                                        Bus
                           2 Threads
  1 Threads
     Q4 2000                  Q2 2003                          Q2 2005                                Q3 2006                             Q4 2006

     While single core performance has increased due to clock speed,
     While single core performance has increased due to clock speed,
  increased cache and improved ILP the biggest performance increases
  increased cache and improved ILP the biggest performance increases
               have come from the thread level parallelism.
               have come from the thread level parallelism.
                        Intel® Processor Micro-architecture - Core® microarchitecture

       25
      Copyright © 2006, Intel Corporation. All rights reserved.
   Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


A “Acronym Cheat Sheet” of Parallel
Computing
CMP: Chip Multi Processor (two or more cores per package)
• Dual Core: two cores in same package
• Quad Core: four cores in same package
DP: Dual Processor (two packages)
MP: Multi Processor (four or more packages)
SMT: Symmetric Multi Threading (virtual multi core: HyperThreading)




                                                           Intel® Processor Micro-architecture - Core® microarchitecture

       26
      Copyright © 2006, Intel Corporation. All rights reserved.
   Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


Agenda


Introduction
Knowledge preparation
Notable features
• Wide Dynamic Execution
• Smart Memory Access
• Advanced Smart Cache
• Advanced Digital Media Boost
• Intelligent Power Capability
Micro-architecture tour
Coding considerations


                                                           Intel® Processor Micro-architecture - Core® microarchitecture

       27
      Copyright © 2006, Intel Corporation. All rights reserved.
   Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College




Intel® Core® Micro-architecture Notable
Features                  Instruction Fetch
Intel® Wide Dynamic Execution                                                                                                  and PreDecode
• 14-stage efficient pipeline
                                                                                                                           Instruction Queue                                               2M/4M
  •      Wider execution path                                                                                                                           5                                 shared L2
  •      Advanced branch prediction                                                                     uCode
                                                                                                         ROM
                                                                                                                                            Decode                                          Cache
  •      Macro-fusion                                                                                                                                   4
         • Roughly ~15% of all instructions are
           conditional branches                                                                                                                                                              up to
         • Macro-fusion fuses a comparison                                                                                       Rename/Alloc
           and jump to reduce micro-ops
                                                                                                                                                                                           10.4 Gb/s
           running down the pipeline                                                                                                                                                         FSB
  •      Micro-fusion                                                                                   Retirement Unit
                                                                                                                                                                          4
         • Merges the load and operation                                                               (ReOrder Buffer)
           micro-ops into one macro-op
• 64-Bit Support                                                                                                                     Schedulers
                                                                                                                ALU                   ALU                  ALU
  •      Merom, Conroe, and Woodcrest                                                                          Branch                FAdd                 FMul
         support EM64T                                                                                        MMX/SSE               MMX/SSE              MMX/SSE              Load         Store
                                                                                                              FPmove                FPmove               FPmove


                                                                                                                   L1 D-Cache and D-TLB
                                                              Intel® Processor Micro-architecture - Core® microarchitecture

          28
         Copyright © 2006, Intel Corporation. All rights reserved.
      Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College




Intel® Core® Micro-architecture Notable
Features (cont.)
Intel® Advanced Memory Access
• Improved prefetching
• Memory disambiguation
 • Advance load before a possible data dependency (pointer conflict)
      • Earlier loads hide memory latencies




                                                           Intel® Processor Micro-architecture - Core® microarchitecture

       29
      Copyright © 2006, Intel Corporation. All rights reserved.
   Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College




Intel® Core® Micro-architecture Notable
Features (cont.)
Intel® Advanced Smart Cache
• Multi-core optimization
 •      Shared between the two cores
 •      Advanced Transfer Cache architecture
 •      Reduced bus traffic
 •      Both cores have full access to the entire cache
 •      Dynamic Cache sizing




                                                             Intel® Processor Micro-architecture - Core® microarchitecture

         30
        Copyright © 2006, Intel Corporation. All rights reserved.
     Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


Intel® Core® Micro-architecture Notable
Features (cont.)
Advantages of Shared Cache
                                                                 Memory

                                        Front Side Bus (FSB)
                                                                                                                             Shipping L2 Cache Line
                                                                                                                             ~Half access to memory




                                                                                                     Cache Line
                  CPU1                                                                                           CPU2


                                                          Intel® Processor Micro-architecture - Core® microarchitecture

      31
     Copyright © 2006, Intel Corporation. All rights reserved.
  Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


Intel® Core® Micro-architecture Notable
Features (cont.)
Advantages of Shared Cache (cont.)
                                                                         Memory

                                                  Front Side Bus (FSB)
                                                                                                             L2 is shared:
                                                                                                             No need to ship cache
                                                                                                             line
                                                                  Cache Line
                                                         CPU1                                    CPU2


                                                          Intel® Processor Micro-architecture - Core® microarchitecture

      32
     Copyright © 2006, Intel Corporation. All rights reserved.
  Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College




Intel® Core® Micro-architecture Notable
Features (cont.)
Intel® Advanced Digital Media Boost                                                                                                                   SIMD Operation
                                                                                                                                                    (SSE/SSE2/SSE3/SSSE)
• Single Cycle SIMD Operation
                                                                                                               SOURCE                        127                                                                0
 • 8 Single Precision Flops/cycle                                                                                                                  X4                X3                X2                X1
 • 4 Double Precision Flops/cycle                                                                           SSE/2/3 OP

• Wide Operations                                                                                                                                  Y4                Y3                Y2                Y1

 •      128-bit                 packed                  Add                                                       DEST

 •      128-bit                 packed                  Multiply
                                                                                                    Core™ µarch
 •      128-bit                 packed                  Load
                                                                                                                     CLOCK
                                                                                                                                              X4opY4 X3opY3 X2opY2 X1opY1
 •      128-bit                 packed                  Store                                                        CYCLE 1

• Support for Intel® EM64T                                                                           Previous                                          CLOCK
                                                                                                                                                                                  X2opY2 X1opY1
                                                                                                                                                       CYCLE 1
  instructions
                                                                                                                    CLOCK                     X4opY4 X3opY3
                                                                                                                    CYCLE 2



                                                             Intel® Processor Micro-architecture - Core® microarchitecture

         33
        Copyright © 2006, Intel Corporation. All rights reserved.
     Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


Intel® Core® Micro-architecture Notable
Features
Intel® Advanced Digital Media Boost
• Additional Media Instructions - Supplemental Streaming SIMD
  Extensions 3 (SSSE3)
 • 16 new packed integer instructions
 • Targeting video encode/decode
• Significantly improved strings
 • REP MOVS and REP STOS
      • ~8 bytes / cycle throughput
          •          mileage may vary




                                                           Intel® Processor Micro-architecture - Core® microarchitecture

       34
      Copyright © 2006, Intel Corporation. All rights reserved.
   Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


Intel® Core® Micro-architecture Notable
Features
Intel® Advanced Digital Media Boost
• Supplemental SSE-3 (SSSE-3)

Horizontal Addition/Subtraction
                                                                                                                        PHADDW, PHADDSW, PHADDD,
                                                                                                                         PHSUBW, PHSUBSW, PHSUBD

    Packed Absolute Values

                                                                                                                                   PABSB, PABSW, PABSD
   Multiply and Add Packed
   Signed/Unsigned bytes
                                                                                                                                                 PMADDUBSW

  Packed multiply High with
  Round and Scale                                                                                                                                  PMULHRSW


     Packed Shuffle Bytes
                                                                                                                                                        PSHUFB


              Packed SIGN                                                                                                                       PSIGNB/W/D

       Packed Align Right
                                                                                                                                                       PALIGNR
                                                             Intel® Processor Micro-architecture - Core® microarchitecture

         35
        Copyright © 2006, Intel Corporation. All rights reserved.
     Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College




Intel® Core® Micro-architecture Notable
Features (cont.)
Intelligent Power Capability
• Advanced power gating & Dynamic power coordination
 •      Multi-point demand-based switching
 •      Voltage-Frequency switching separation
 •      Supports transitions to deeper sleep modes
 •      Event blocking
 •      Clock partitioning and recovery
 •      Dynamic Bus Parking
 •      During periods of high performance execution, many parts of the
        chip core can be shut off




                                                             Intel® Processor Micro-architecture - Core® microarchitecture

         36
        Copyright © 2006, Intel Corporation. All rights reserved.
     Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


Agenda


Introduction
Knowledge preparation
Notable features
Micro-architecture tour
• Front End
• Out-Of-Order Execution Core
• Memory Sub-system
Coding considerations




                                                           Intel® Processor Micro-architecture - Core® microarchitecture

       37
      Copyright © 2006, Intel Corporation. All rights reserved.
   Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


Intel® Core® Micro-architecture Drill-down


                                                          page miss handler                                                            store
 icache
           branch                                                                                                                     address                                      integer
          prediction
predecode    unit
                                                          data                          memory                                                                                        FP
                                                                                                                                       load                                         SIMD
                                                         cache                           order
instruction                                               unit                           buffer                                     store
                                                                                                                                                                                     (3x)
  queue                                                                                                                             data


   instruction                                                 register                                                                             Reservation
     decode                                                   alias table                                                                             Station

             MS                                                     ALLOC                                                         Re-Order Buffer
                                                            Intel® Processor Micro-architecture - Core® microarchitecture

        38
       Copyright © 2006, Intel Corporation. All rights reserved.
    Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


Agenda


Introduction
Knowledge refreshment
Notable features
Micro-architecture tour
• Front End
• Out-Of-Order Execution Core
• Memory Sub-system
Coding considerations




                                                           Intel® Processor Micro-architecture - Core® microarchitecture

       39
      Copyright © 2006, Intel Corporation. All rights reserved.
   Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


Core® Micro-architecture Front End


Instruction preparation before executed                                                                                                       icache
                                                                                                                                                                                  branch
• Instruction Fetch Unit                                                                                                                                                         prediction
                                                                                                                                    predecode                                       unit
• Instruction Queue
• Instruction Decode Unit
• Branch Prediction Unit                                                                                                                  instruction
                                                                                                                                            queue



                                                                                                                                                    instruction
                                                                                                                                                      decode

                                                                                                                                                                 MS
                                                           Intel® Processor Micro-architecture - Core® microarchitecture

       40
      Copyright © 2006, Intel Corporation. All rights reserved.
   Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Intel® Core™ Microarchitecture – Front End

  Instruction Queue


  Buffer between instruction pre-decode unit and decoder
  • up to six predecoded instructions written per cycle
  • 18 Instructions contained in IQ
  • up to 5 Instructions read from IQ
  Potential Loop cache
  Loop Stream Detector (LSD) support
  • Re-use of decoded instruction
  • Potential power saving




                                                               Intel® Processor Micro-architecture - Core® microarchitecture

           41
          Copyright © 2006, Intel Corporation. All rights reserved.
       Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Intel® Core™ Microarchitecture – Front End

  Macro - Fusion

                                                                                                                                                                                     Scheduler
  Roughly ~15% of all instructions are
                                                                                                                                   cmpjae eax, [mem], label
  conditional branches.
  Macro-fusion merges two instructions
  into a single micro-op, as if the two
  instructions were a single long
  instruction.                                                                                                                                                                           Execution

  Enhanced Arithmetic Logic Unit (ALU)
  for macro-fusion. Each macro-fused
  instruction executes with a single
  dispatch.                                                                                                                                                                                     Branch
                                                                                                                                                                                                Eval
  Not supported in EM64T long mode
                                                                                                                                     flags and target to Write back

                                                               Intel® Processor Micro-architecture - Core® microarchitecture

           42
          Copyright © 2006, Intel Corporation. All rights reserved.
       Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Intel® Core™ Microarchitecture – Front End


  Macro-Fusion Absent                                                                                                                                  Instruction Queue
                                                                                                                                                       addps xmm0, [EAX+16]
  Read four instructions from
                                                                                                                                                       mulps xmm0, xmm0
  Instruction Queue
  Each instruction gets decoded                                                                                                                        movps [EAX+240], xmm0
  into separate uops
                                                                                                                                                       cmp eax, 100000
  Enabling Example
                                                                                                                                                       jge label
  for (int i=0; i<100000; i++) {
                   …                                                                                                                addps xmm0, [EAX+16]                                                      dec0
                                                                                                         Cycle 1
  }                                                                                                                                 mulps xmm0, xmm0                                                        dec1
                                                                                                                                    movps [EAX+240], xmm0                                                dec2
                                                                                                                                    cmp eax, 100000                                                   dec3
                                                                                                    Cycle 2                    jge label                                                          dec0
                                                               Intel® Processor Micro-architecture - Core® microarchitecture

           43
          Copyright © 2006, Intel Corporation. All rights reserved.
       Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Intel® Core™ Microarchitecture – Front End


  Macro-Fusion Presented                                                                                                                               Instruction Queue
                                                                                                                                                       addps xmm0, [EAX+16]
  Read five Instructions from
  Instruction Queue                                                                                                                                    mulps xmm0, xmm0

  Send fusable pair to single                                                                                                                          movps [EAX+240], xmm0
  decoder
                                                                                                                                                       cmp eax, 100000
  Single uop represents two
  instructions                                                                                                                                         jae label
  Enabling Example
  for (unsigned int i=0;                                                                                 Cycle 1                    addps xmm0, [EAX+16]                                                      dec0
  i<100000; i++) {
                                                                                                                                    mulps xmm0, xmm0                                                       dec1
                   …                                                                                                                movps [EAX+240], xmm0                                                dec2
  }                                                                                                                 cmpjae                  eax, 100000, label                                        dec3


                                                               Intel® Processor Micro-architecture - Core® microarchitecture

           44
          Copyright © 2006, Intel Corporation. All rights reserved.
       Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Intel® Core™ Microarchitecture – Front End

  Instruction Decode / Micro-Op Fusion


  Frequent pairs of micro-operations derived from the same
  Macro Instruction can be fused into a single micro-operation




                                   Micro-op fusion effectively widens the pipeline

                                                               Intel® Processor Micro-architecture - Core® microarchitecture

           45
          Copyright © 2006, Intel Corporation. All rights reserved.
       Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Intel® Core™ Microarchitecture – Front End

  Instruction Decode / Micro-Fusion (cont.)


  u-ops of a Store “movps [EAX+240], xmm0”




                    sta eax+240
                                                                                                                   st xmm0, [eax+240]
                    std xmm0, [eax+240]




                                                               Intel® Processor Micro-architecture - Core® microarchitecture

           46
          Copyright © 2006, Intel Corporation. All rights reserved.
       Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Intel® Core™ Microarchitecture – Front End

  Branch Prediction Improvements


  Intel® Pentium® 4 Processor branch prediction
  PLUS the following two improvements:




                Indirect Branch Predictor                                                                                                    Loop Detector

                                           Branch miss-predictions reduced by >20%


                                                               Intel® Processor Micro-architecture - Core® microarchitecture

           47
          Copyright © 2006, Intel Corporation. All rights reserved.
       Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


Agenda


Introduction
Knowledge preparation
Notable features
Micro-architecture tour
• Front End
• Out-Of-Order Execution Core
• Memory Sub-system
Coding considerations




                                                           Intel® Processor Micro-architecture - Core® microarchitecture

       48
      Copyright © 2006, Intel Corporation. All rights reserved.
   Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


Core® Micro-architecture Execution Core

                                                                                                                                                             store
Accepted decoded u-ops, assign resources,                                                                                                                   address integer
execute and retire u-ops                                                                                                                                                                      FP
                                                                                                                                                            load
• Renamer                                                                                                                                                                                   SIMD
                                                                                                                                                           store
                                                                                                                                                           data
                                                                                                                                                                                             (3x)
• Reservation station (RS)
                                                                                                                 register                                              Reservation
• Issue ports
                                                                                                                alias table                                              Station
• Execution Unit                                                                                                     ALLOC                               Re-Order Buffer




                                                           Intel® Processor Micro-architecture - Core® microarchitecture

       49
      Copyright © 2006, Intel Corporation. All rights reserved.
   Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Intel® Core™ Microarchitecture – Execution Core

  Execution Core Building Blocks



                                 Renamer                        Ports (number)

                                           RS
                                                                           0,1,5                                                                                                     0,1,5
                                                                                                          SIMD/Integer                                    0,1,5
                                                                           SIMD                                                                                                     Floating
                                                                                                             MUL                                         Integer
                                        ROB                               Integer                                                                                                    Point
                                                                                                                      Execution Unit



                                                                                                         2 Load
                                                                                                        3,4 Store




    Memory Sub-system
                                                               Intel® Processor Micro-architecture - Core® microarchitecture

           50
          Copyright © 2006, Intel Corporation. All rights reserved.
       Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Intel® Core™ Microarchitecture – Execution Core

  Issue Ports and Execution Units
  6 dispatch ports from RS
  • 3 execution ports
     • (shared for integer / fp / simd)
  • load
  • store (address)
  • store (data)
  128-bit SSE implementation
  • Port 0 has packed multiply (4 cycles SP 5 DP pipelined)
  • Port 1 has packed add (3 cycles all precisions)




                                                               Intel® Processor Micro-architecture - Core® microarchitecture

           51
          Copyright © 2006, Intel Corporation. All rights reserved.
       Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Intel® Core™ Microarchitecture – Execution Core

  Retirement Unit


  ReOrder Buffer (ROB)
  • Holds micro-ops in various stages of completion
  • Buffers completed micro-ops
  • updates the architectural state in order
  • manages ordering of exceptions



                                     register                                                                          Reservation
                                    alias table                                                                          Station
                                           ALLOC                                                       Re-Order Buffer

                                                               Intel® Processor Micro-architecture - Core® microarchitecture

           52
          Copyright © 2006, Intel Corporation. All rights reserved.
       Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


Agenda


Introduction
Knowledge preparation
Notable features
Micro-architecture tour
• Front End
• Out-Of-Order Execution Core
• Memory Sub-system
Coding considerations




                                                           Intel® Processor Micro-architecture - Core® microarchitecture

       53
      Copyright © 2006, Intel Corporation. All rights reserved.
   Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


Core® Micro-architecture Memory Sub-
System
Memory Ordering Buffer
• Store Address Buffer
 • Stores the address of each store not actually performed
 • Loads compare address to any store older than itself
      • If it find a hole…
• Store Data Buffer
 • Stores data of each store not actually performed
 • If load hit on the SAB, it forward the data from here
• Load Buffer
 • Stores address of non-retired loads
 • For snoops and re-dispatch
• One 128-bit load and one 128-bit store per cycle to different
  memory locations
• Out of order Memory operations

                                                           Intel® Processor Micro-architecture - Core® microarchitecture

       54
      Copyright © 2006, Intel Corporation. All rights reserved.
   Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Intel® Core™ Microarchitecture – Memory Sub-system


  Core® Micro-architecture Memory Sub-
  System (cont.)
  32k D-Cache (8-way, 64 byte line size)
  Shared second level (L2) 2MB 8-way or 4MB 16-way instruction and data cache
  Cache to cache transfer
  • improves producer / consumer style MP
  Wider interface to L2
  • reduced interference
    • processor line fill is 2 cycles
                                                                                                                                                     Core1                            Core2
  Higher bandwidth from the L2 cache to the core
  • ~14 clock latency and 2 clock throughput
  Load & Store Access order
                                                                                                                                                                        Bus
     1.    L1 cache of immediate core
     2.    L1 cache of the other core                                                                                                            2 MB L2 Cache
     3.    L2 cache
     4.    Memory


                                                                  Intel® Processor Micro-architecture - Core® microarchitecture

              55
             Copyright © 2006, Intel Corporation. All rights reserved.
          Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Intel® Core™ Microarchitecture – Memory Sub-system

  Advanced Memory Access / Enhanced Data
  Pre-fetch Logic
  Speculates the next needed data and loads it into cache by HW
  and/or SW




       Door    Valet Parking Area                                                                                       Main Parking Lot
    (L1 Cache)    (L2 Cache)                                                                                           (External Memory)




                                                               Intel® Processor Micro-architecture - Core® microarchitecture

           56
          Copyright © 2006, Intel Corporation. All rights reserved.
       Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Intel® Core™ Microarchitecture – Memory Sub-system

  Advanced Memory Access / Enhanced Data
  Pre-fetch Logic (cont.)
  • L1D cache prefetching
    • Data Cache Unit Prefetcher
         • Known as the streaming prefetcher
         • Recognizes ascending access patterns in recently loaded data
         • Prefetches the next line into the processors cache
    • Instruction Based Stride Prefetcher
         • Prefetches based upon a load having a regular stride
         • Can prefetch forward or backward 2 Kbytes
             •          1/2 default page size
  • L2 cache prefetching: Data Prefetch Logic (DPL)
    • Prefetches data to the 2nd level cache before the DCU requests
      the data
    • Maintains 2 tables for tracking loads
         • Upstream – 16 entries
         • Downstream – 4 entries
    • Every load is either found in the DPL or generates a new entry
    • Upon recognition of the 2nd load of a “stream” the DPL will
      prefetch the next load
                                                               Intel® Processor Micro-architecture - Core® microarchitecture

           57
          Copyright © 2006, Intel Corporation. All rights reserved.
       Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Intel® Core™ Microarchitecture – Memory Sub-system

  Advanced Memory Access / Memory
  Disambiguation
  Memory Disambiguation predictor
  • Loads that are predicted NOT to forward from preceding store
    are allowed to schedule as early as possible
     • increasing the performance of OOO memory pipelines

  Disambiguated loads checked at retirement
  • Extension to existing coherency mechanism
  • Invisible to software and system




                                                               Intel® Processor Micro-architecture - Core® microarchitecture

           58
          Copyright © 2006, Intel Corporation. All rights reserved.
       Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Intel® Core™ Microarchitecture – Memory Sub-system

  Advanced Memory Access / Memory
  Disambiguation Absent
  Load4 must WAIT until previous stores complete

                                                                                                                                Memory
                                                                                                                               Data W
                        Store1                                Y
                        Load2                                 Y
                                                                                                                               Data Z
                        Store3                                W

                        Load4                                X
                                                                                                                               Data Y

                                                                                                                               Data X
                                                               Intel® Processor Micro-architecture - Core® microarchitecture

           59
          Copyright © 2006, Intel Corporation. All rights reserved.
       Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Intel® Core™ Microarchitecture – Memory Sub-system

  Advanced Memory Access / Memory
  Disambiguation Presented
  Loads can decouple from stores
  Load4 can get its data WITHOUT waiting for stores
                                       Memory
                                                                                                                               Data W
                       Load4                                X
                       Store1                               Y
                       Load2                                 Y                                                                 Data Z
                       Store3                                W

                                                                                                                               Data Y

                                                                                                                               Data X
                                                               Intel® Processor Micro-architecture - Core® microarchitecture

           60
          Copyright © 2006, Intel Corporation. All rights reserved.
       Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College
Intel® Core™ Microarchitecture – Memory Sub-system

  Advanced Memory Access / Stores
  Forwarding
  If a load follows a store and reloads the data that the store
  writes to memory, the micro-architecture can forward the data
  directly from the store to the load



                                                                                                                                                             Memory



           Store1                                Y
                                                                                      Internal
            Load2                                Y                                    Buffers
                                                                                                                                                             Data Y
                                                               Intel® Processor Micro-architecture - Core® microarchitecture

           61
          Copyright © 2006, Intel Corporation. All rights reserved.
       Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


Advanced Memory Access / Stores
Forwarding: Aligned Store Cases
 store 16                           store 32 bit                                               store 64 bit

 load 16                            load 32 bit                                                load 64 bit

 ld 8 ld 8                          load 16 load 16                                            load 32 bit                                     load 32 bit

                                    ld 8 ld 8 ld 8 ld 8                                        load 16 load 16 load 16 load 16

                                                                                               ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8

 store 128 bit

 load 128 bit

 load 64 bit                                                                                   load 64 bit

 load 32 bit                                   load 32 bit                                     load 32 bit                                     load 32 bit

 load 16 load 16 load 16 load 16 load 16 load 16 load 16 load 16

 ld 8 ld 8 ld 8 ld 8 ld 8 Intel® Processorld 8 ld 8 ld -8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8
                           ld 8 ld 8 Micro-architecture Core® microarchitecture
      62
     Copyright © 2006, Intel Corporation. All rights reserved.
  Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


Advanced Memory Access / Stores
Forwarding: Unaligned Cases
Note that unaligned store forward does not occur when the load
crosses a cache line boundary
  store 16                            store 32 bit                                               store 64 bit

  load 16‡                            load 32 bit‡                                               load 64 bit

  ld 8 ld 8                           load 16‡ load 16                                           load 32 bit‡                                   load 32 bit

                                      ld 8 ld 8 ld 8 ld 8                                        load 16‡ load 16 load 16 load 16

                                                                                                 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8
  ld 8 Store forwarded to load
                                                                                                    Note: Unaligned 128-bit stores
  ld 8 No forwarding                                                                                are issued as two 64-bit stores.
  ‡:
                                                                                                    This provides two alignments for
       No forwarding if the load                                                                    store forwarding
       crosses a cache line boundary
                                                           Intel® Processor Micro-architecture - Core® microarchitecture

       63
       Copyright © 2006, Intel Corporation. All rights reserved.
   Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


Agenda


Introduction
Knowledge preparation
Notable features
Micro-architecture tour
Coding considerations




                                                           Intel® Processor Micro-architecture - Core® microarchitecture

       64
      Copyright © 2006, Intel Corporation. All rights reserved.
   Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


Optimizing for
Instruction Fetch and PreDecode
Avoid “Length Changing Prefixes” (LCPs)
• Affects instructions with immediate data or offset
• Operand Size Override (66H)
• Address Size Override (67H) [obsolete]
• LCPs change the length decoding algorithm – increasing the
  processing time from one cycle to six cycles (or eleven cycles
  when the instruction spans a 16-byte boundary)
• The REX (EM64T) prefix (4xH) is not an LCP
 • The REX prefix does lengthen the instruction by one byte, so use
   of the first eight general registers in EM64T is preferred




                                                           Intel® Processor Micro-architecture - Core® microarchitecture

       65
      Copyright © 2006, Intel Corporation. All rights reserved.
   Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


Optimizing for
Instruction Queue
Includes a “Loop Stream Detector” (LSD)
• Potentially very high bandwidth instruction streaming
• A number of requirements to make use of the LSD
 •      Maximum of 18 instructions in up to four 16-byte packets
 •      No RET instructions (hence, little practical use for CALLs)
 •      Up to four taken branches allowed
 •      Most effective at 70+ iterations
• LSD is after PreDecode so there is no added cost for LCPs
• Trade-off LSD with conventional loop unrolling




                                                             Intel® Processor Micro-architecture - Core® microarchitecture

         66
        Copyright © 2006, Intel Corporation. All rights reserved.
     Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Software College


Optimizing for
Decode
Decoder issues up to 4 uOps for renaming/ allocation per clock
• This creates a trade off between more complex instruction
  uOps versus multiple simple instruction uOps
• For example, a single four uOp instruction is all that can be
  renamed/allocated in a single clock
• In some cases, multiple simple instructions may be a better
  choice than a single complex instruction
• Single uOp instructions allow more decoder flexibility
 • For example, 4-1-1-1 can be decoded in one clock
 • However, 2-2-2-1 takes three clocks to decode




                                                           Intel® Processor Micro-architecture - Core® microarchitecture

       67
      Copyright © 2006, Intel Corporation. All rights reserved.
   Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core

Contenu connexe

Tendances

A presentation on Evaluation of Microprocessor
A presentation on Evaluation of MicroprocessorA presentation on Evaluation of Microprocessor
A presentation on Evaluation of MicroprocessorShah Imtiyaj
 
AMD Processor
AMD ProcessorAMD Processor
AMD ProcessorAli Fahad
 
Arduino presentation by_warishusain
Arduino presentation by_warishusainArduino presentation by_warishusain
Arduino presentation by_warishusainstudent
 
Evolution of microprocessors
Evolution of microprocessorsEvolution of microprocessors
Evolution of microprocessorsAnas Abrar
 
History of-microprocessors
History of-microprocessorsHistory of-microprocessors
History of-microprocessorsmudulin
 
Architecture of 80286 microprocessor
Architecture of 80286 microprocessorArchitecture of 80286 microprocessor
Architecture of 80286 microprocessorSyed Ahmed Zaki
 
Evolution Of Microprocessors
Evolution Of MicroprocessorsEvolution Of Microprocessors
Evolution Of Microprocessorsharinder
 
Introduction to-microprocessors
Introduction to-microprocessorsIntroduction to-microprocessors
Introduction to-microprocessorsVolodymyr Ushenko
 
Presentation on - Processors
Presentation on - Processors Presentation on - Processors
Presentation on - Processors The Avi Sharma
 
Evolution of Intel Processors
Evolution of Intel ProcessorsEvolution of Intel Processors
Evolution of Intel ProcessorsShad Ahmad Zaidi
 
History of microprocessors
History of microprocessorsHistory of microprocessors
History of microprocessorsEmrah Aptoula
 
History of intel microprocessors ppt
History of intel microprocessors pptHistory of intel microprocessors ppt
History of intel microprocessors pptNajam Khattak
 
CPU Performance Enhancements
CPU Performance EnhancementsCPU Performance Enhancements
CPU Performance EnhancementsDilum Bandara
 

Tendances (20)

A presentation on Evaluation of Microprocessor
A presentation on Evaluation of MicroprocessorA presentation on Evaluation of Microprocessor
A presentation on Evaluation of Microprocessor
 
AMD Processor
AMD ProcessorAMD Processor
AMD Processor
 
Arduino presentation by_warishusain
Arduino presentation by_warishusainArduino presentation by_warishusain
Arduino presentation by_warishusain
 
Intel core i3 processor
Intel core i3 processorIntel core i3 processor
Intel core i3 processor
 
Evolution of microprocessors
Evolution of microprocessorsEvolution of microprocessors
Evolution of microprocessors
 
Mobile processors
Mobile processorsMobile processors
Mobile processors
 
Pcie basic
Pcie basicPcie basic
Pcie basic
 
History of-microprocessors
History of-microprocessorsHistory of-microprocessors
History of-microprocessors
 
Architecture of 80286 microprocessor
Architecture of 80286 microprocessorArchitecture of 80286 microprocessor
Architecture of 80286 microprocessor
 
Evolution of Microprocessor
Evolution of MicroprocessorEvolution of Microprocessor
Evolution of Microprocessor
 
Evolution Of Microprocessors
Evolution Of MicroprocessorsEvolution Of Microprocessors
Evolution Of Microprocessors
 
Intel core i7
Intel core i7Intel core i7
Intel core i7
 
Introduction to-microprocessors
Introduction to-microprocessorsIntroduction to-microprocessors
Introduction to-microprocessors
 
Presentation on - Processors
Presentation on - Processors Presentation on - Processors
Presentation on - Processors
 
Intel processor family
Intel processor familyIntel processor family
Intel processor family
 
Amd processor
Amd processorAmd processor
Amd processor
 
Evolution of Intel Processors
Evolution of Intel ProcessorsEvolution of Intel Processors
Evolution of Intel Processors
 
History of microprocessors
History of microprocessorsHistory of microprocessors
History of microprocessors
 
History of intel microprocessors ppt
History of intel microprocessors pptHistory of intel microprocessors ppt
History of intel microprocessors ppt
 
CPU Performance Enhancements
CPU Performance EnhancementsCPU Performance Enhancements
CPU Performance Enhancements
 

En vedette

Intel I3,I5,I7 Processor
Intel I3,I5,I7 ProcessorIntel I3,I5,I7 Processor
Intel I3,I5,I7 Processorsagar solanky
 
Evolution of intel microprocessors
Evolution of intel microprocessorsEvolution of intel microprocessors
Evolution of intel microprocessorsAurang Zaib
 
Final draft intel core i5 processors architecture
Final draft intel core i5 processors architectureFinal draft intel core i5 processors architecture
Final draft intel core i5 processors architectureJawid Ahmad Baktash
 
Intel Processor History
Intel Processor HistoryIntel Processor History
Intel Processor Historynglkumar
 
Intel processor trace - What are Recorded?
Intel processor trace - What are Recorded?Intel processor trace - What are Recorded?
Intel processor trace - What are Recorded?Pipat Methavanitpong
 
Basic Introduction to an Operation - OR Design and Aseptic Techniques
Basic Introduction to an Operation - OR Design and Aseptic TechniquesBasic Introduction to an Operation - OR Design and Aseptic Techniques
Basic Introduction to an Operation - OR Design and Aseptic TechniquesReynaldo Joson
 
Desktop operating system
Desktop operating systemDesktop operating system
Desktop operating systemFazla Rabbi
 
Modern operating system.......
Modern operating system.......Modern operating system.......
Modern operating system.......vignesh0009
 
Embedded Web Services Report
Embedded Web Services ReportEmbedded Web Services Report
Embedded Web Services ReportBernie Chiu
 
Modern Operating System Windows Server 2008
Modern Operating System  Windows Server 2008Modern Operating System  Windows Server 2008
Modern Operating System Windows Server 2008Sneha Chopra
 

En vedette (20)

Intel processors
Intel processorsIntel processors
Intel processors
 
Intel I3,I5,I7 Processor
Intel I3,I5,I7 ProcessorIntel I3,I5,I7 Processor
Intel I3,I5,I7 Processor
 
Evolution of intel microprocessors
Evolution of intel microprocessorsEvolution of intel microprocessors
Evolution of intel microprocessors
 
Processors
ProcessorsProcessors
Processors
 
Final draft intel core i5 processors architecture
Final draft intel core i5 processors architectureFinal draft intel core i5 processors architecture
Final draft intel core i5 processors architecture
 
Intel Core i7 Processors
Intel Core i7 ProcessorsIntel Core i7 Processors
Intel Core i7 Processors
 
Evolution of processors
Evolution of processorsEvolution of processors
Evolution of processors
 
Intel Processor History
Intel Processor HistoryIntel Processor History
Intel Processor History
 
Intel processor trace - What are Recorded?
Intel processor trace - What are Recorded?Intel processor trace - What are Recorded?
Intel processor trace - What are Recorded?
 
OSCh2
OSCh2OSCh2
OSCh2
 
Basic Introduction to an Operation - OR Design and Aseptic Techniques
Basic Introduction to an Operation - OR Design and Aseptic TechniquesBasic Introduction to an Operation - OR Design and Aseptic Techniques
Basic Introduction to an Operation - OR Design and Aseptic Techniques
 
Desktop operating system
Desktop operating systemDesktop operating system
Desktop operating system
 
Modern operating system.......
Modern operating system.......Modern operating system.......
Modern operating system.......
 
80386 Architecture
80386 Architecture80386 Architecture
80386 Architecture
 
Embedded Web Services Report
Embedded Web Services ReportEmbedded Web Services Report
Embedded Web Services Report
 
Modern Operating System Windows Server 2008
Modern Operating System  Windows Server 2008Modern Operating System  Windows Server 2008
Modern Operating System Windows Server 2008
 
GPS
GPS GPS
GPS
 
Robotics and autmation
Robotics and autmationRobotics and autmation
Robotics and autmation
 
Waterfall
WaterfallWaterfall
Waterfall
 
Arrandale presentation1
Arrandale presentation1Arrandale presentation1
Arrandale presentation1
 

Similaire à 01 intel processor architecture core

Features of modern intel microprocessors
Features of modern intel microprocessorsFeatures of modern intel microprocessors
Features of modern intel microprocessorsKrunal Siddhapathak
 
Software Development Tools for Intel® IoT Platforms
Software Development Tools for Intel® IoT PlatformsSoftware Development Tools for Intel® IoT Platforms
Software Development Tools for Intel® IoT PlatformsIntel® Software
 
Evaluating Microsoft Windows 8 Security on Intel Architecture Tablets
Evaluating Microsoft Windows 8 Security on Intel Architecture TabletsEvaluating Microsoft Windows 8 Security on Intel Architecture Tablets
Evaluating Microsoft Windows 8 Security on Intel Architecture TabletsIT@Intel
 
Windows 8 hardware sensors
Windows 8 hardware sensorsWindows 8 hardware sensors
Windows 8 hardware sensorsMatteo Pagani
 
Noile soluţii Intel pentru afaceri eficiente-23apr2010
Noile soluţii Intel pentru afaceri eficiente-23apr2010Noile soluţii Intel pentru afaceri eficiente-23apr2010
Noile soluţii Intel pentru afaceri eficiente-23apr2010Agora Group
 
Noile soluţii Intel pentru afaceri eficiente-20apr2010
Noile soluţii Intel pentru afaceri eficiente-20apr2010Noile soluţii Intel pentru afaceri eficiente-20apr2010
Noile soluţii Intel pentru afaceri eficiente-20apr2010Agora Group
 
Noile tehnologii INTEL pentru infrastructuri IT eficiente-19mar2010
Noile tehnologii INTEL pentru infrastructuri IT eficiente-19mar2010Noile tehnologii INTEL pentru infrastructuri IT eficiente-19mar2010
Noile tehnologii INTEL pentru infrastructuri IT eficiente-19mar2010Agora Group
 
Noile solutii Intel pentru afaceri eficiente-tm-20mai2010
Noile solutii Intel pentru afaceri eficiente-tm-20mai2010Noile solutii Intel pentru afaceri eficiente-tm-20mai2010
Noile solutii Intel pentru afaceri eficiente-tm-20mai2010Agora Group
 
What's under the hood of Exadata X2-2 and X2-8?
What's under the hood of Exadata X2-2 and X2-8?What's under the hood of Exadata X2-2 and X2-8?
What's under the hood of Exadata X2-2 and X2-8?Enkitec
 
Develop, Deploy, and Innovate with Intel® Cluster Ready
Develop, Deploy, and Innovate with Intel® Cluster ReadyDevelop, Deploy, and Innovate with Intel® Cluster Ready
Develop, Deploy, and Innovate with Intel® Cluster ReadyIntel IT Center
 
Intel Roadmap 2010
Intel Roadmap 2010Intel Roadmap 2010
Intel Roadmap 2010Umair Mohsin
 
Explore, design and implement threading parallelism with Intel® Advisor XE
Explore, design and implement threading parallelism with Intel® Advisor XEExplore, design and implement threading parallelism with Intel® Advisor XE
Explore, design and implement threading parallelism with Intel® Advisor XEIntel IT Center
 
Accelerating Insights in the Technical Computing Transformation
Accelerating Insights in the Technical Computing TransformationAccelerating Insights in the Technical Computing Transformation
Accelerating Insights in the Technical Computing TransformationIntel IT Center
 
Intel Knights Landing Slides
Intel Knights Landing SlidesIntel Knights Landing Slides
Intel Knights Landing SlidesRonen Mendezitsky
 
Understanding Intel Products from DarrenYaoYao
Understanding Intel Products from DarrenYaoYaoUnderstanding Intel Products from DarrenYaoYao
Understanding Intel Products from DarrenYaoYaoDarrenYaoYao
 
Accelerating Mission Critical Transformation at Red Hat Summit 2011
Accelerating Mission Critical Transformation at Red Hat Summit 2011Accelerating Mission Critical Transformation at Red Hat Summit 2011
Accelerating Mission Critical Transformation at Red Hat Summit 2011Pauline Nist
 
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013Intel Software Brasil
 
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYehMAKERPRO.cc
 

Similaire à 01 intel processor architecture core (20)

Features of modern intel microprocessors
Features of modern intel microprocessorsFeatures of modern intel microprocessors
Features of modern intel microprocessors
 
Software Development Tools for Intel® IoT Platforms
Software Development Tools for Intel® IoT PlatformsSoftware Development Tools for Intel® IoT Platforms
Software Development Tools for Intel® IoT Platforms
 
Evaluating Microsoft Windows 8 Security on Intel Architecture Tablets
Evaluating Microsoft Windows 8 Security on Intel Architecture TabletsEvaluating Microsoft Windows 8 Security on Intel Architecture Tablets
Evaluating Microsoft Windows 8 Security on Intel Architecture Tablets
 
Windows 8 hardware sensors
Windows 8 hardware sensorsWindows 8 hardware sensors
Windows 8 hardware sensors
 
Intel
IntelIntel
Intel
 
Noile soluţii Intel pentru afaceri eficiente-23apr2010
Noile soluţii Intel pentru afaceri eficiente-23apr2010Noile soluţii Intel pentru afaceri eficiente-23apr2010
Noile soluţii Intel pentru afaceri eficiente-23apr2010
 
Noile soluţii Intel pentru afaceri eficiente-20apr2010
Noile soluţii Intel pentru afaceri eficiente-20apr2010Noile soluţii Intel pentru afaceri eficiente-20apr2010
Noile soluţii Intel pentru afaceri eficiente-20apr2010
 
Noile tehnologii INTEL pentru infrastructuri IT eficiente-19mar2010
Noile tehnologii INTEL pentru infrastructuri IT eficiente-19mar2010Noile tehnologii INTEL pentru infrastructuri IT eficiente-19mar2010
Noile tehnologii INTEL pentru infrastructuri IT eficiente-19mar2010
 
Noile solutii Intel pentru afaceri eficiente-tm-20mai2010
Noile solutii Intel pentru afaceri eficiente-tm-20mai2010Noile solutii Intel pentru afaceri eficiente-tm-20mai2010
Noile solutii Intel pentru afaceri eficiente-tm-20mai2010
 
What's under the hood of Exadata X2-2 and X2-8?
What's under the hood of Exadata X2-2 and X2-8?What's under the hood of Exadata X2-2 and X2-8?
What's under the hood of Exadata X2-2 and X2-8?
 
Develop, Deploy, and Innovate with Intel® Cluster Ready
Develop, Deploy, and Innovate with Intel® Cluster ReadyDevelop, Deploy, and Innovate with Intel® Cluster Ready
Develop, Deploy, and Innovate with Intel® Cluster Ready
 
Intel Roadmap 2010
Intel Roadmap 2010Intel Roadmap 2010
Intel Roadmap 2010
 
Explore, design and implement threading parallelism with Intel® Advisor XE
Explore, design and implement threading parallelism with Intel® Advisor XEExplore, design and implement threading parallelism with Intel® Advisor XE
Explore, design and implement threading parallelism with Intel® Advisor XE
 
Accelerating Insights in the Technical Computing Transformation
Accelerating Insights in the Technical Computing TransformationAccelerating Insights in the Technical Computing Transformation
Accelerating Insights in the Technical Computing Transformation
 
Intel Knights Landing Slides
Intel Knights Landing SlidesIntel Knights Landing Slides
Intel Knights Landing Slides
 
Understanding Intel Products from DarrenYaoYao
Understanding Intel Products from DarrenYaoYaoUnderstanding Intel Products from DarrenYaoYao
Understanding Intel Products from DarrenYaoYao
 
Accelerating Mission Critical Transformation at Red Hat Summit 2011
Accelerating Mission Critical Transformation at Red Hat Summit 2011Accelerating Mission Critical Transformation at Red Hat Summit 2011
Accelerating Mission Critical Transformation at Red Hat Summit 2011
 
Intel Roadmap
Intel RoadmapIntel Roadmap
Intel Roadmap
 
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
 
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
 

Dernier

Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Nikki Chapple
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 

Dernier (20)

Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 

01 intel processor architecture core

  • 1. Intel® Core™ Microarchitecture Intel® Software College
  • 2. Intel® Software College Objectives After completion of this module you will be able to describe • Components of an IA processor • Working flow of the instruction pipeline • Notable features of the architecture Intel® Processor Micro-architecture - Core® microarchitecture 2 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 3. Intel® Software College Agenda Introduction Knowledge preparation Notable features Micro-architecture tour Coding considerations Intel® Processor Micro-architecture - Core® microarchitecture 3 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 4. Intel® Software College Agenda Introduction Knowledge preparation Notable features Micro-architecture tour Coding considerations Intel® Processor Micro-architecture - Core® microarchitecture 4 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 5. Industrial Recognition Intel® Software College PC Format May 2006 “Intel Strikes Back! Conroe is the name. Pistol-whipping Athlon 64s into burger meat is the game..“ Intel's Next Generation Microarchitecture Unveiled Real World Tech “Just as important as the technical innovations in Core MPUs, this microarchitecture will have a profound impact on the industry. “ Intel Dishes the Knockout Punch to AMD with Conroe, GD Hardware.com “…the results were far more than we could hope for and it'll be amusing to see AMD's response to this beat-down session Intel Regains Performance Crown, Anandtech “… At 2.8 or 3.0GHz, a Conroe EE would offer even stronger performance than what we’ve seen here.” Intel Reveals Conroe Architecture, Extremetech “… And not only was the Intel system running at 2.66GHz— a slower clock rate than the top Pentium 4—it was outpacing an overclocked Athlon 64 FX-60. Wrap your brain around that idea for a bit…” Conroe Benchmarks - Intel Showing Big Strength Hot Hardware.com Intel® Processor Micro-architecture - Core® microarchitecture “… Intel is poised to change the face of the desktop computing landscape…” 5 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 6. Intel® Software College Performance Summary Intel® Core™ Microarchitecture dramatically boosts Intel platform performance • Conroe & Woodcrest drive clear Desktop/Server performance leadership • Merom extends Intel Mobile performance leadership Intel® Core™ Microarchitecture-based platforms set the bar in Performance and Energy Efficiency for the Multi- Core era • Intel’s 3rd generation dual-core (while competition stuck on 1st generation) • New Intel high-performance ‘engine’: Wider, Smarter, Faster, More Efficient Best Processor on the Planet: Energy-Efficient Performance 1 Energy- The “Core™ Effect”: Intel® Core™ Microarchitecture 20% (Merom), broad roadmap accelerationsPerformance Boosts1 ! ramp fuels 40% (Conroe), 80% (Woodcrest) Intel® Processor Micro-architecture - Core® microarchitecture 6 1 Based on SPECint*_rate_base2000 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 7. Intel® Software College Agenda Introduction Knowledge preparation • Architecture VS Microarchitecture • CISC VS RISC • Performance Measurements • Pipeline Design • Power and Energy • Chip Multi-Processing Notable features Micro-architecture tour Coding considerations Intel® Processor Micro-architecture - Core® microarchitecture 7 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 8. Intel® Software College Architecture and Micro-architecture What is Computer Architecture? • Architecture is the set of features which are externally visible: • Instruction set • Registers • Addressing modes • Bus protocols Intel Architectures (IA) • IA32/X86 (8-bit, 16-bit and 32-bit Integer architecture) • X87 (Floating Point extension) • MMX (Multi-Media extension) • SSE, SSE2, SSE3 (SIMD Streaming Extension) • Intel® 64/EM64T (64-bit Integer extension of IA32) ? Go to detail! • IA64 (Intel new 64-bit architecture) • Itanium/Itainium2 processor family Intel® Processor Micro-architecture - Core® microarchitecture 8 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 9. Intel® Software College Architecture and Micro-architecture (cont.) What is Micro-architecture? • Same as m–Architecture or u-Architecture • “Invisible” features that provide meaningful value to the end user (whatever makes you buy a new compatible PC) • Programs run faster Improved Performance • Reduced Power consumption Extended Battery life • H/W fits into Smaller Form Factor Intel® Processor Micro-architecture - Core® microarchitecture 9 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 10. Intel® Software College Intel® Architecture History * IXA – Intel Internet Exchange Architecture/ EPIC – Explicitly Parallel Instruction Computing Examples: Architecture: Instruction set definition EPIC* (Itanium®) IA-32 IXA* (XScale) and compatibility Microarchitecture: Hardware implementation Examples: maintaining instruction set compatibility with high-level P5 P6 Intel NetBurst® Banias architecture Processors: Productized implementation of Microarchitecture Examples: Pentium® 4 Pentium® Pro Pentium® Pentium® D Pentium® M Pentium® II/III Xeon® Intel® Processor Micro-architecture - Core® microarchitecture 10 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 11. Intel® Software College Intel® Core™ Microarchitecture Processors Intel® NetBurst® + New Innovations Mobile Microarchitecture Intel® Core™ 2 Duo/Quad/Extreme processors Intel® Processor Micro-architecture - Core® microarchitecture 11 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 12. Intel® Software College RISC Approach to CPU design (RISC = Reduced Instruction Set Computers) Optimize H/W for common basic operations • Fixed instruction length • Shorter Execution Pipeline • Ease of Instruction Level Parallelism • Large number of registers • Less memory accesses • ‘Load/Store’ architecture • Shorter Execution Pipeline • Ease of advancing Loads • Branch Hints • Reduce pipeline flush events • ‘Exotic’ stuff to be implemented in S/W with minimal H/W support • No ‘complex’ H/W instructions • Handle exceptional conditions in S/W Examples: MIPS, IBM Power and PowerPC, Sun Sparc Achieve Maximum performance by right partitioning between H/W and S/W Intel® Processor Micro-architecture - Core® microarchitecture 12 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 13. Intel® Software College CISC Approach to CPU design (CISC = Complex Instruction Set Computers) Rich architecture • Variable length instructions. • Complex addressing modes. On-chip HW / SW partitioning required • H/W keeps executing ‘simple’ stuff • Complex instructions are ‘emulated’ using u-code routines from ROM • More instructions treated as ‘simple’ as more H/W is available COMPATIBILITY has some major advantages: • Large (and forever increasing) software base • Code development tools • Expertise • H/W - S/W spiral Example: Intel IA32, Motorola 680X0 Maximize information passed to the HW Intel® Processor Micro-architecture - Core® microarchitecture 13 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 14. Intel® Software College Performance Measurement Performance is the reciprocal of the “Time of execution”: 1 1 Performance ≈ = Were: Time _ of _ Execution L * CPI * TC L = Code Length (# of machine instructions) CPI = Clock cycles Per Instruction Tc = Clock period (nSecs) Substitute: IPC = Instructions Per Cycle = 1/CPI F = Frequency = 1/Tc Improve ILP Improve Timing IPC * F Performance ≈ L Arch Enhancements Intel® Processor Micro-architecture - Core® microarchitecture 14 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 15. Intel® Software College Performance Measurement (cont.) Benchmarks examples Performance considerations: • Industry Standard • Which Code/Application to run? • Spec (ISPEC, FSPEC) • Which OS? • TPC • Commercial • Which other components in the • SysMark platform? • MobileMark • Under which thermal conditions? • PCMark • Multithreading? Multiprocessing? • Sandra • ScienceMark • Applications • Video (Windows Media encoder, DivX) • Audio (Lame MP3) • Compression (RAR) • Content creation (3DSM, Photoshop, Premiere) • Latest Games (Doom III, FarCry, but changes fast) • Specific industries use specific benchmarks • Linux compilation, POVRay, LinPack, lmbench Intel® Processor Micro-architecture - Core® microarchitecture 15 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 16. Intel® Software College Design Considerations for Different Market Segments Constrains: • Thermally, area constrained Desktop • Unconstrained Extreme • Very area constrained Value • Thermally, Energy and Area constrained Mobile • Thermally, Energy Servers Micro-architecture is the Art of Tradeoffs between: • Schedule • Requirements / Standards • Performance • Features • Power / Energy • Area / Cost Intel® Processor Micro-architecture - Core® microarchitecture 16 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 17. Intel® Software College Design Metrics IPC = Instructions per Cycle • The more the better Latency – same as Response Time • The time interval between • when any request for data is made and • when the data transfer completes • The less the better Throughput • The amount of work completed by the system per unit of time. • The more the better • ops/sec Intel® Processor Micro-architecture - Core® microarchitecture 17 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 18. Intel® Software College CPU Pipeline Break the work to smaller pieces • Four basic stages of instruction life • Fetch - bring instruction to core • Decode - read operands from register • Execute - perform the operation • Writeback - save result to register • Execution timing of simple instructions (legend: “op src1,src2 dst”) add eax, ebx eax F D E W sub ecx, edx ecx F D E W Increased throughput • increased number of completed instructions per cycle Intel® Processor Micro-architecture - Core® microarchitecture 18 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 19. Intel® Software College Pipeline Design - Explore Parallelism New instruction not always depends on previous one • Can start new instruction before previous one is finished • ...if different stages use different H/W resources Run instructions in parallel (pipeline) Add eax, ebx eax F D E W Sub ecx, edx ecx F D E W Or edi, esi edi F D E W Need to balance pipe stages • Each stage should take same time for best throughput and utilization Clock cycle is determined by the longest path! Fetch Decode Exec WB Fetch Decode Exec WB Fetch Decode Exec WB Fetch Decode Exec WB Intel® Processor Micro-architecture - Core® microarchitecture 19 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 20. Intel® Software College Pipeline Design – Fighting Stalls Data flow dependency (instructions output/input) • Solved by bypasses, renaming etc Control flow dependencies • Solved by branch prediction Others (Cache misses, long latency instructions) • Solved by other dynamic scheduling techniques ? Go to detail! Intel® Processor Micro-architecture - Core® microarchitecture 20 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 21. Intel® Software College Race of CISC vs. RISC In modern CPUs Advanced µ-Architecture Techniques minimize the advantages of RISC over CISC • Branch Prediction • Reduces the effect of extra pipeline stages • Register Renaming • Effectively Increase the Number of Registers • Out Of Order • Reduce Number of stalls caused by shortage of registers • Speculative Execution • Further Reduce Number of stalls • Power saving features • Reduce the overhead when not needed. Intel® Processor Micro-architecture - Core® microarchitecture 21 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 22. Intel® Software College op – Intel’s Take of the CICS/RISC Race (CISC) Instructions are translated into one or more (RISC) uop(micro-operation)s • Fixed format • Wide and simple • Temp registers Usually one uop per instruction Complex instruction can be thousands of uops Stores divided into two uops (STA and STD) Fusion play games here Intel® Processor Micro-architecture - Core® microarchitecture 22 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 23. Intel® Software College Power and Energy Maximum power (TDP): • Cooling requirements • Cooling solution • Computer form factor and acoustic noise Average power • Battery life • Electricity bill General calculation: • P = frequency * voltage^2 * activity factor * capacitance + leakage Reducing TDP • Less transistors and wires • Smaller transistors and wires • Power features less activity • Low leakage transistors Reducing average power • Energy efficiency • Power states • Lower leakage Intel® Processor Micro-architecture - Core® microarchitecture 23 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 24. Intel® Software College Dual/Multi Core and SMT Put more than one core per package Architectural change: • Software must be multi-threaded or multi-process • …but backward compatible with multiprocessor systems (MP) Several ways of implementing it • All of them being used I/O I/O I/O I/O LLC LLC LLC LLC LLC Core Core Core Core Core Core SMT: Run two (or more) threads on the same core, simultaneously Intel® Processor Micro-architecture - Core® microarchitecture 24 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 25. Intel® Software College Intel Approach ? Intel® Intel® XQ6700* Intel® Intel® Core 2 Duo® Duo® Intel® Intel® Pentium® D Pentium® Processor 80 Threads Intel® Intel® Pentium® Pentium® With HT Intel® Intel® 4 Threads Pentium® Pentium® 2 Threads State 2 Threads Execution Units Cache Bus 2 Threads 1 Threads Q4 2000 Q2 2003 Q2 2005 Q3 2006 Q4 2006 While single core performance has increased due to clock speed, While single core performance has increased due to clock speed, increased cache and improved ILP the biggest performance increases increased cache and improved ILP the biggest performance increases have come from the thread level parallelism. have come from the thread level parallelism. Intel® Processor Micro-architecture - Core® microarchitecture 25 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 26. Intel® Software College A “Acronym Cheat Sheet” of Parallel Computing CMP: Chip Multi Processor (two or more cores per package) • Dual Core: two cores in same package • Quad Core: four cores in same package DP: Dual Processor (two packages) MP: Multi Processor (four or more packages) SMT: Symmetric Multi Threading (virtual multi core: HyperThreading) Intel® Processor Micro-architecture - Core® microarchitecture 26 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 27. Intel® Software College Agenda Introduction Knowledge preparation Notable features • Wide Dynamic Execution • Smart Memory Access • Advanced Smart Cache • Advanced Digital Media Boost • Intelligent Power Capability Micro-architecture tour Coding considerations Intel® Processor Micro-architecture - Core® microarchitecture 27 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 28. Intel® Software College Intel® Core® Micro-architecture Notable Features Instruction Fetch Intel® Wide Dynamic Execution and PreDecode • 14-stage efficient pipeline Instruction Queue 2M/4M • Wider execution path 5 shared L2 • Advanced branch prediction uCode ROM Decode Cache • Macro-fusion 4 • Roughly ~15% of all instructions are conditional branches up to • Macro-fusion fuses a comparison Rename/Alloc and jump to reduce micro-ops 10.4 Gb/s running down the pipeline FSB • Micro-fusion Retirement Unit 4 • Merges the load and operation (ReOrder Buffer) micro-ops into one macro-op • 64-Bit Support Schedulers ALU ALU ALU • Merom, Conroe, and Woodcrest Branch FAdd FMul support EM64T MMX/SSE MMX/SSE MMX/SSE Load Store FPmove FPmove FPmove L1 D-Cache and D-TLB Intel® Processor Micro-architecture - Core® microarchitecture 28 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 29. Intel® Software College Intel® Core® Micro-architecture Notable Features (cont.) Intel® Advanced Memory Access • Improved prefetching • Memory disambiguation • Advance load before a possible data dependency (pointer conflict) • Earlier loads hide memory latencies Intel® Processor Micro-architecture - Core® microarchitecture 29 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 30. Intel® Software College Intel® Core® Micro-architecture Notable Features (cont.) Intel® Advanced Smart Cache • Multi-core optimization • Shared between the two cores • Advanced Transfer Cache architecture • Reduced bus traffic • Both cores have full access to the entire cache • Dynamic Cache sizing Intel® Processor Micro-architecture - Core® microarchitecture 30 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 31. Intel® Software College Intel® Core® Micro-architecture Notable Features (cont.) Advantages of Shared Cache Memory Front Side Bus (FSB) Shipping L2 Cache Line ~Half access to memory Cache Line CPU1 CPU2 Intel® Processor Micro-architecture - Core® microarchitecture 31 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 32. Intel® Software College Intel® Core® Micro-architecture Notable Features (cont.) Advantages of Shared Cache (cont.) Memory Front Side Bus (FSB) L2 is shared: No need to ship cache line Cache Line CPU1 CPU2 Intel® Processor Micro-architecture - Core® microarchitecture 32 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 33. Intel® Software College Intel® Core® Micro-architecture Notable Features (cont.) Intel® Advanced Digital Media Boost SIMD Operation (SSE/SSE2/SSE3/SSSE) • Single Cycle SIMD Operation SOURCE 127 0 • 8 Single Precision Flops/cycle X4 X3 X2 X1 • 4 Double Precision Flops/cycle SSE/2/3 OP • Wide Operations Y4 Y3 Y2 Y1 • 128-bit packed Add DEST • 128-bit packed Multiply Core™ µarch • 128-bit packed Load CLOCK X4opY4 X3opY3 X2opY2 X1opY1 • 128-bit packed Store CYCLE 1 • Support for Intel® EM64T Previous CLOCK X2opY2 X1opY1 CYCLE 1 instructions CLOCK X4opY4 X3opY3 CYCLE 2 Intel® Processor Micro-architecture - Core® microarchitecture 33 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 34. Intel® Software College Intel® Core® Micro-architecture Notable Features Intel® Advanced Digital Media Boost • Additional Media Instructions - Supplemental Streaming SIMD Extensions 3 (SSSE3) • 16 new packed integer instructions • Targeting video encode/decode • Significantly improved strings • REP MOVS and REP STOS • ~8 bytes / cycle throughput • mileage may vary Intel® Processor Micro-architecture - Core® microarchitecture 34 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 35. Intel® Software College Intel® Core® Micro-architecture Notable Features Intel® Advanced Digital Media Boost • Supplemental SSE-3 (SSSE-3) Horizontal Addition/Subtraction PHADDW, PHADDSW, PHADDD, PHSUBW, PHSUBSW, PHSUBD Packed Absolute Values PABSB, PABSW, PABSD Multiply and Add Packed Signed/Unsigned bytes PMADDUBSW Packed multiply High with Round and Scale PMULHRSW Packed Shuffle Bytes PSHUFB Packed SIGN PSIGNB/W/D Packed Align Right PALIGNR Intel® Processor Micro-architecture - Core® microarchitecture 35 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 36. Intel® Software College Intel® Core® Micro-architecture Notable Features (cont.) Intelligent Power Capability • Advanced power gating & Dynamic power coordination • Multi-point demand-based switching • Voltage-Frequency switching separation • Supports transitions to deeper sleep modes • Event blocking • Clock partitioning and recovery • Dynamic Bus Parking • During periods of high performance execution, many parts of the chip core can be shut off Intel® Processor Micro-architecture - Core® microarchitecture 36 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 37. Intel® Software College Agenda Introduction Knowledge preparation Notable features Micro-architecture tour • Front End • Out-Of-Order Execution Core • Memory Sub-system Coding considerations Intel® Processor Micro-architecture - Core® microarchitecture 37 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 38. Intel® Software College Intel® Core® Micro-architecture Drill-down page miss handler store icache branch address integer prediction predecode unit data memory FP load SIMD cache order instruction unit buffer store (3x) queue data instruction register Reservation decode alias table Station MS ALLOC Re-Order Buffer Intel® Processor Micro-architecture - Core® microarchitecture 38 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 39. Intel® Software College Agenda Introduction Knowledge refreshment Notable features Micro-architecture tour • Front End • Out-Of-Order Execution Core • Memory Sub-system Coding considerations Intel® Processor Micro-architecture - Core® microarchitecture 39 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 40. Intel® Software College Core® Micro-architecture Front End Instruction preparation before executed icache branch • Instruction Fetch Unit prediction predecode unit • Instruction Queue • Instruction Decode Unit • Branch Prediction Unit instruction queue instruction decode MS Intel® Processor Micro-architecture - Core® microarchitecture 40 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 41. Intel® Software College Intel® Core™ Microarchitecture – Front End Instruction Queue Buffer between instruction pre-decode unit and decoder • up to six predecoded instructions written per cycle • 18 Instructions contained in IQ • up to 5 Instructions read from IQ Potential Loop cache Loop Stream Detector (LSD) support • Re-use of decoded instruction • Potential power saving Intel® Processor Micro-architecture - Core® microarchitecture 41 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 42. Intel® Software College Intel® Core™ Microarchitecture – Front End Macro - Fusion Scheduler Roughly ~15% of all instructions are cmpjae eax, [mem], label conditional branches. Macro-fusion merges two instructions into a single micro-op, as if the two instructions were a single long instruction. Execution Enhanced Arithmetic Logic Unit (ALU) for macro-fusion. Each macro-fused instruction executes with a single dispatch. Branch Eval Not supported in EM64T long mode flags and target to Write back Intel® Processor Micro-architecture - Core® microarchitecture 42 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 43. Intel® Software College Intel® Core™ Microarchitecture – Front End Macro-Fusion Absent Instruction Queue addps xmm0, [EAX+16] Read four instructions from mulps xmm0, xmm0 Instruction Queue Each instruction gets decoded movps [EAX+240], xmm0 into separate uops cmp eax, 100000 Enabling Example jge label for (int i=0; i<100000; i++) { … addps xmm0, [EAX+16] dec0 Cycle 1 } mulps xmm0, xmm0 dec1 movps [EAX+240], xmm0 dec2 cmp eax, 100000 dec3 Cycle 2 jge label dec0 Intel® Processor Micro-architecture - Core® microarchitecture 43 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 44. Intel® Software College Intel® Core™ Microarchitecture – Front End Macro-Fusion Presented Instruction Queue addps xmm0, [EAX+16] Read five Instructions from Instruction Queue mulps xmm0, xmm0 Send fusable pair to single movps [EAX+240], xmm0 decoder cmp eax, 100000 Single uop represents two instructions jae label Enabling Example for (unsigned int i=0; Cycle 1 addps xmm0, [EAX+16] dec0 i<100000; i++) { mulps xmm0, xmm0 dec1 … movps [EAX+240], xmm0 dec2 } cmpjae eax, 100000, label dec3 Intel® Processor Micro-architecture - Core® microarchitecture 44 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 45. Intel® Software College Intel® Core™ Microarchitecture – Front End Instruction Decode / Micro-Op Fusion Frequent pairs of micro-operations derived from the same Macro Instruction can be fused into a single micro-operation Micro-op fusion effectively widens the pipeline Intel® Processor Micro-architecture - Core® microarchitecture 45 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 46. Intel® Software College Intel® Core™ Microarchitecture – Front End Instruction Decode / Micro-Fusion (cont.) u-ops of a Store “movps [EAX+240], xmm0” sta eax+240 st xmm0, [eax+240] std xmm0, [eax+240] Intel® Processor Micro-architecture - Core® microarchitecture 46 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 47. Intel® Software College Intel® Core™ Microarchitecture – Front End Branch Prediction Improvements Intel® Pentium® 4 Processor branch prediction PLUS the following two improvements: Indirect Branch Predictor Loop Detector Branch miss-predictions reduced by >20% Intel® Processor Micro-architecture - Core® microarchitecture 47 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 48. Intel® Software College Agenda Introduction Knowledge preparation Notable features Micro-architecture tour • Front End • Out-Of-Order Execution Core • Memory Sub-system Coding considerations Intel® Processor Micro-architecture - Core® microarchitecture 48 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 49. Intel® Software College Core® Micro-architecture Execution Core store Accepted decoded u-ops, assign resources, address integer execute and retire u-ops FP load • Renamer SIMD store data (3x) • Reservation station (RS) register Reservation • Issue ports alias table Station • Execution Unit ALLOC Re-Order Buffer Intel® Processor Micro-architecture - Core® microarchitecture 49 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 50. Intel® Software College Intel® Core™ Microarchitecture – Execution Core Execution Core Building Blocks Renamer Ports (number) RS 0,1,5 0,1,5 SIMD/Integer 0,1,5 SIMD Floating MUL Integer ROB Integer Point Execution Unit 2 Load 3,4 Store Memory Sub-system Intel® Processor Micro-architecture - Core® microarchitecture 50 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 51. Intel® Software College Intel® Core™ Microarchitecture – Execution Core Issue Ports and Execution Units 6 dispatch ports from RS • 3 execution ports • (shared for integer / fp / simd) • load • store (address) • store (data) 128-bit SSE implementation • Port 0 has packed multiply (4 cycles SP 5 DP pipelined) • Port 1 has packed add (3 cycles all precisions) Intel® Processor Micro-architecture - Core® microarchitecture 51 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 52. Intel® Software College Intel® Core™ Microarchitecture – Execution Core Retirement Unit ReOrder Buffer (ROB) • Holds micro-ops in various stages of completion • Buffers completed micro-ops • updates the architectural state in order • manages ordering of exceptions register Reservation alias table Station ALLOC Re-Order Buffer Intel® Processor Micro-architecture - Core® microarchitecture 52 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 53. Intel® Software College Agenda Introduction Knowledge preparation Notable features Micro-architecture tour • Front End • Out-Of-Order Execution Core • Memory Sub-system Coding considerations Intel® Processor Micro-architecture - Core® microarchitecture 53 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 54. Intel® Software College Core® Micro-architecture Memory Sub- System Memory Ordering Buffer • Store Address Buffer • Stores the address of each store not actually performed • Loads compare address to any store older than itself • If it find a hole… • Store Data Buffer • Stores data of each store not actually performed • If load hit on the SAB, it forward the data from here • Load Buffer • Stores address of non-retired loads • For snoops and re-dispatch • One 128-bit load and one 128-bit store per cycle to different memory locations • Out of order Memory operations Intel® Processor Micro-architecture - Core® microarchitecture 54 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 55. Intel® Software College Intel® Core™ Microarchitecture – Memory Sub-system Core® Micro-architecture Memory Sub- System (cont.) 32k D-Cache (8-way, 64 byte line size) Shared second level (L2) 2MB 8-way or 4MB 16-way instruction and data cache Cache to cache transfer • improves producer / consumer style MP Wider interface to L2 • reduced interference • processor line fill is 2 cycles Core1 Core2 Higher bandwidth from the L2 cache to the core • ~14 clock latency and 2 clock throughput Load & Store Access order Bus 1. L1 cache of immediate core 2. L1 cache of the other core 2 MB L2 Cache 3. L2 cache 4. Memory Intel® Processor Micro-architecture - Core® microarchitecture 55 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 56. Intel® Software College Intel® Core™ Microarchitecture – Memory Sub-system Advanced Memory Access / Enhanced Data Pre-fetch Logic Speculates the next needed data and loads it into cache by HW and/or SW Door Valet Parking Area Main Parking Lot (L1 Cache) (L2 Cache) (External Memory) Intel® Processor Micro-architecture - Core® microarchitecture 56 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 57. Intel® Software College Intel® Core™ Microarchitecture – Memory Sub-system Advanced Memory Access / Enhanced Data Pre-fetch Logic (cont.) • L1D cache prefetching • Data Cache Unit Prefetcher • Known as the streaming prefetcher • Recognizes ascending access patterns in recently loaded data • Prefetches the next line into the processors cache • Instruction Based Stride Prefetcher • Prefetches based upon a load having a regular stride • Can prefetch forward or backward 2 Kbytes • 1/2 default page size • L2 cache prefetching: Data Prefetch Logic (DPL) • Prefetches data to the 2nd level cache before the DCU requests the data • Maintains 2 tables for tracking loads • Upstream – 16 entries • Downstream – 4 entries • Every load is either found in the DPL or generates a new entry • Upon recognition of the 2nd load of a “stream” the DPL will prefetch the next load Intel® Processor Micro-architecture - Core® microarchitecture 57 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 58. Intel® Software College Intel® Core™ Microarchitecture – Memory Sub-system Advanced Memory Access / Memory Disambiguation Memory Disambiguation predictor • Loads that are predicted NOT to forward from preceding store are allowed to schedule as early as possible • increasing the performance of OOO memory pipelines Disambiguated loads checked at retirement • Extension to existing coherency mechanism • Invisible to software and system Intel® Processor Micro-architecture - Core® microarchitecture 58 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 59. Intel® Software College Intel® Core™ Microarchitecture – Memory Sub-system Advanced Memory Access / Memory Disambiguation Absent Load4 must WAIT until previous stores complete Memory Data W Store1 Y Load2 Y Data Z Store3 W Load4 X Data Y Data X Intel® Processor Micro-architecture - Core® microarchitecture 59 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 60. Intel® Software College Intel® Core™ Microarchitecture – Memory Sub-system Advanced Memory Access / Memory Disambiguation Presented Loads can decouple from stores Load4 can get its data WITHOUT waiting for stores Memory Data W Load4 X Store1 Y Load2 Y Data Z Store3 W Data Y Data X Intel® Processor Micro-architecture - Core® microarchitecture 60 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 61. Intel® Software College Intel® Core™ Microarchitecture – Memory Sub-system Advanced Memory Access / Stores Forwarding If a load follows a store and reloads the data that the store writes to memory, the micro-architecture can forward the data directly from the store to the load Memory Store1 Y Internal Load2 Y Buffers Data Y Intel® Processor Micro-architecture - Core® microarchitecture 61 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 62. Intel® Software College Advanced Memory Access / Stores Forwarding: Aligned Store Cases store 16 store 32 bit store 64 bit load 16 load 32 bit load 64 bit ld 8 ld 8 load 16 load 16 load 32 bit load 32 bit ld 8 ld 8 ld 8 ld 8 load 16 load 16 load 16 load 16 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 store 128 bit load 128 bit load 64 bit load 64 bit load 32 bit load 32 bit load 32 bit load 32 bit load 16 load 16 load 16 load 16 load 16 load 16 load 16 load 16 ld 8 ld 8 ld 8 ld 8 ld 8 Intel® Processorld 8 ld 8 ld -8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 Micro-architecture Core® microarchitecture 62 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 63. Intel® Software College Advanced Memory Access / Stores Forwarding: Unaligned Cases Note that unaligned store forward does not occur when the load crosses a cache line boundary store 16 store 32 bit store 64 bit load 16‡ load 32 bit‡ load 64 bit ld 8 ld 8 load 16‡ load 16 load 32 bit‡ load 32 bit ld 8 ld 8 ld 8 ld 8 load 16‡ load 16 load 16 load 16 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 Store forwarded to load Note: Unaligned 128-bit stores ld 8 No forwarding are issued as two 64-bit stores. ‡: This provides two alignments for No forwarding if the load store forwarding crosses a cache line boundary Intel® Processor Micro-architecture - Core® microarchitecture 63 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 64. Intel® Software College Agenda Introduction Knowledge preparation Notable features Micro-architecture tour Coding considerations Intel® Processor Micro-architecture - Core® microarchitecture 64 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 65. Intel® Software College Optimizing for Instruction Fetch and PreDecode Avoid “Length Changing Prefixes” (LCPs) • Affects instructions with immediate data or offset • Operand Size Override (66H) • Address Size Override (67H) [obsolete] • LCPs change the length decoding algorithm – increasing the processing time from one cycle to six cycles (or eleven cycles when the instruction spans a 16-byte boundary) • The REX (EM64T) prefix (4xH) is not an LCP • The REX prefix does lengthen the instruction by one byte, so use of the first eight general registers in EM64T is preferred Intel® Processor Micro-architecture - Core® microarchitecture 65 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 66. Intel® Software College Optimizing for Instruction Queue Includes a “Loop Stream Detector” (LSD) • Potentially very high bandwidth instruction streaming • A number of requirements to make use of the LSD • Maximum of 18 instructions in up to four 16-byte packets • No RET instructions (hence, little practical use for CALLs) • Up to four taken branches allowed • Most effective at 70+ iterations • LSD is after PreDecode so there is no added cost for LCPs • Trade-off LSD with conventional loop unrolling Intel® Processor Micro-architecture - Core® microarchitecture 66 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  • 67. Intel® Software College Optimizing for Decode Decoder issues up to 4 uOps for renaming/ allocation per clock • This creates a trade off between more complex instruction uOps versus multiple simple instruction uOps • For example, a single four uOp instruction is all that can be renamed/allocated in a single clock • In some cases, multiple simple instructions may be a better choice than a single complex instruction • Single uOp instructions allow more decoder flexibility • For example, 4-1-1-1 can be decoded in one clock • However, 2-2-2-1 takes three clocks to decode Intel® Processor Micro-architecture - Core® microarchitecture 67 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.