SlideShare une entreprise Scribd logo
1  sur  35
Superscalar and VLIW
    Architectures
Parallel processing [2]
Processing instructions in parallel requires
   three major tasks:
2. checking dependencies between
   instructions to determine which
   instructions can be grouped together for
   parallel execution;
3. assigning instructions to the functional
   units on the hardware;
4. determining when instructions are initiated
   placed together into a single word.
Major categories [2]




VLIW – Very Long Instruction Word
EPIC – Explicitly Parallel Instruction Computing
Major categories [2]
Superscalar Processors [1]

    Superscalar processors are designed to exploit
     more instruction-level parallelism in user
     programs.
    Only independent instructions can be executed
     in parallel without causing a wait state.
    The amount of instruction-level parallelism
     varies widely depending on the type of code
     being executed.
Pipelining in Superscalar
Processors [1]
     In order to fully utilise a superscalar processor
      of degree m, m instructions must be executable
      in parallel. This situation may not be true in all
      clock cycles. In that case, some of the pipelines
      may be stalling in a wait state.
     In a superscalar processor, the simple
      operation latency should require only one cycle,
      as in the base scalar processor.
Superscalar Execution
Superscalar
Implementation
   Simultaneously fetch multiple instructions
   Logic to determine true dependencies
    involving register values
   Mechanisms to communicate these values
   Mechanisms to initiate multiple instructions in
    parallel
   Resources for parallel execution of multiple
    instructions
   Mechanisms for committing process state in
    correct order
Some Architectures
   PowerPC 604
    – six independent execution units:
           Branch execution unit
           Load/Store unit
           3 Integer units
           Floating-point unit
    – in-order issue
    – register renaming
   Power PC 620
    – provides in addition to the 604 out-of-order issue
   Pentium
    – three independent execution units:
           2 Integer units
           Floating point unit
    – in-order issue
VLIW
   Very Long Instruction Word (VLIW) architectures are used for executing more
    than one basic instruction at a time.

   These processors contain multiple functional units, which fetch from the
    instruction cache a Very-Long Instruction Word containing several basic
    instructions, and dispatch the entire VLIW for parallel execution. These
    capabilities are exploited by compilers which generate code that has grouped
    together independent primitive instructions executable in parallel.

   VLIW has been described as a natural successor to RISC (Reduced Instruction
    Set Computing), because it moves complexity from the hardware to the compiler,
    allowing simpler, faster processors.

    VLIW eliminates the complicated instruction scheduling and parallel dispatch
    that occurs in most modern microprocessors.
WHY VLIW ?
The key to higher performance in microprocessors for a broad range of
applications is the ability to exploit fine-grain, instruction-level
parallelism.

Some methods for exploiting fine-grain parallelism include:

   Pipelining
   Multiple processors
   Superscalar implementation
   Specifying multiple independent operations per instruction
Architecture Comparison:
          CISC, RISC & VLIW

ARCHITECTURE                CISC                     RISC                        VLIW
CHARACTERISTIC

INSTRUCTION SIZE   Varies                    One size, usually 32 bits   One size



INSTRUCTION        Field placement varies    Regular, consistent         Regular, consistent
FORMAT                                       placement of fields         placement of
                                                                         Fields
INSTRUCTION        Varies from simple to     Almost always one           Many simple,
SEMANTICS          complex ; possibly many   simple operation            independent
                   dependent operations                                  operations
                   per instruction


REGISTERS          Few, sometimes special    Many, general-purpose       Many, general-purpose
Architecture Comparison:
           CISC, RISC & VLIW
ARCHITECTURE                  CISC                       RISC                      VLIW
CHARACTERISTIC

MEMORY REFERENCES      Bundled with operations   Not bundled with          Not bundled with
                       in many different types   operations,               operations,i.e.,
                       of instructions           i.e.,load/store           load/store
                                                 architecture              architecture

HARDWARE DESIGN        Exploit micro coded       Exploit                   Exploit
FOCUS                  implementations           implementations           Implementations
                                                 with one pipeline and &   With multiple pipelines,
                                                 no microcode              no microcode & no
                                                                           complex dispatch logic

PICTURES OF FIVE
TYPICAL INSTRUCTIONS
Advantages of VLIW
   VLIW processors rely on the compiler that generates the VLIW code to

explicitly specify parallelism. Relying on the compiler has advantages.
   VLIW architecture reduces hardware complexity. VLIW simply moves
    complexity from hardware into software.
What is ILP ?

   Instruction-level parallelism (ILP) is a measure of how many of the
    operations in a computer program can be performed simultaneously.
   A system is said to embody ILP (instruction-level parallelism) is
    multiple instructions runs on them at the same time.
   ILP can have a significant effect on performance which is critical to
    embedded systems.
   ILP provides an form of power saving by slowing the clock.
What we intend to do
    with ILP ?
We use Micro-architectural techniques to exploit the ILP. The various techniques
    include :
   Instruction pipelining which depend on CPU caches.
   Register renaming which refers to a technique used to avoid unnecessary.
    serialization of program operations imposed by the reuse of registers by those
    operations.
   Speculative execution which reduce pipeline stalls due to control dependencies.
   Branch prediction which is used to keep the pipeline full.
   Superscalar execution in which multiple execution units are used to execute
    multiple instructions in parallel.
   Out of Order execution which reduces pipeline stall due to operand dependencies.
Algorithms for
scheduling

Few of the Instruction scheduling algorithms used are :

   List scheduling

   Trace scheduling

   Software pipelining (modulo scheduling)
List Scheduling
List scheduling by steps :
2.   Construct a dependence graph of the basic block. (The edges are

     weighted with the latency of the instruction).

3.   Use the dependence graph to determine instructions that can execute;

     insert on a list, called the Readylist.

4.   Use the dependence graph and the Ready list to schedule an instruction

     that causes the smallest possible stall; update the Ready list. Repeat
Code Representation
for
List Scheduling
      a=b+c
      d=e - f
                   1       2   5       6


                       3           7
1.   load R1, b
2.   load R2, c        4           8
3.   add R2,R1
4.   store a, R2
5.   load R3, e
6.   load R4,f
7.   sub R3,R4
8.   store d,R3
Code Representation
for
List Scheduling
1. load R1, b      1. load R1, b    1       2         5       6
2. load R2, c      5.load R3, e
3. add R2,R1       2. load R2, c        3                 7
4. store a, R2     6.load R4, f
5. load R3, e      3.add R2,R1
6. load R4,f       7.sub R3,R4          4                 8
7. sub R3,R4       4.store a, R2
8. store d,R3      8. store d, R3
                                            a=b+c
                                            d=e - f


Now we have a schedule that requires no stalls and no NOPs.
Problem and
    Solution
   Register allocation conflict : use of same register creates

    anti-Dependencies that restrict scheduling

   Register allocation before scheduling

–prevents good scheduling

   Scheduling before register allocation

–spills destroy scheduling

   Solution : Schedule abstract assembly, Allocate registers, Schedule
Trace scheduling

Steps involved in Trace Scheduling :
    Trace Selection

– Find the most common trace of basic blocks.
    Trace Compaction

–Combine the basic blocks in the trace and schedule them as one block

–Create clean-up code if the execution goes off-trace
    Parallelism across IF branches vs. LOOP branches
    Can provide a speedup if static prediction is accurate
How Trace Scheduling
works
Look for higher priority and trace the blocks as shown below.
How Trace Scheduling
works
After tracing the priority blocks you schedule it first and rest
parallel to that .
How Trace Scheduling
 works
We can see the blocks been
traced depending on the priority.
How Trace Scheduling
works
• Creating large extended basic blocks by duplication
• Schedule the larger blocks




Figure above shows how the extended basic blocks can be
created.
How Trace Scheduling
 works
This block diagram in its final stage shows you the parallelism across the
branches.
Limitations of Trace
 Scheduling


   Optimizations depends on the traces being the dominant paths
    in the program’s control-flow.
   Therefore, the following two things should be true:

–Programs should demonstrate the behavior of being skewed in
    the branches taken at run-time, for typical mixes of input data.

–We should have access to this information at compile time.

    Not so easy.
Software Pipelining
   In software pipelining, iterations of a loop in the source program are

continuously initiated at constant intervals, before the preceding

iterations complete thus taking advantage of the parallelism in data path.
   Its also explained as scheduling the operations within an iteration,

such that the iterations can be pipelined to yield optimal throughput.
   The sequence of instructions before the steady state are called

PROLOG and the ones that are in the sequence after the steady state is

called EPILOG.
Software Pipelining
 Example
•Source code:
for(i=0;i<n;i++) sum += a[i]         r7 = L r6
                                    ---;stall
•Loop body in assembly:
                                    r2 = Add r2,r7
r1 = L r0
---;stall                           r6 = add r6,12
r2 = Addr2,r1
r0 = addr0,4                        r10 = L r9
                                    ---;stall
•Unroll loop & allocate registers
                                    r2 = Add r2,r10
r1 = L r0
---;stall                           r9 = add r9,12
r2 = Add r2,r1
r0 = Add r0,12

r4 = L r3
---;stall
r2 = Add r2,r4
r3 = add r3,12
Software Pipelining
Example
Software Pipelining
Example
Schedule Unrolled Instructions, exploiting VLIW (or not)
                                                   PROLOG


                                                     Identify
                                                     Repeating
                                                     Pattern
                                                     (Kernel)



                                                    EPILOG
Constraints in Software
pipelining

   Recurrence Constraints: which is determined
    by loop carried data dependencies.
   Resource Constraints: which is determined by
    total resource requirements.
Remarks on Software
Pipelining
   Innermost loop, loops with larger trip count, loops without conditionals
    can be software pipelined.
   Code size increase due to prolog and epilog.
   Code size increase due to unrolling for MVE (Modulo Variable
    Expansion).
   Register allocation strategies for software pipelined loops .
   Loops with conditional can be software pipelined if predicated execution
    is supported.

–Higher resource requirement, but efficient schedule

Contenu connexe

Tendances

Cache coherence problem and its solutions
Cache coherence problem and its solutionsCache coherence problem and its solutions
Cache coherence problem and its solutionsMajid Saleem
 
Memory consistency models
Memory consistency modelsMemory consistency models
Memory consistency modelspalani kumar
 
Tutorial getting started with RISC-V verification
Tutorial getting started with RISC-V verificationTutorial getting started with RISC-V verification
Tutorial getting started with RISC-V verificationRISC-V International
 
RISC - Reduced Instruction Set Computing
RISC - Reduced Instruction Set ComputingRISC - Reduced Instruction Set Computing
RISC - Reduced Instruction Set ComputingTushar Swami
 
Instruction Level Parallelism and Superscalar Processors
Instruction Level Parallelism and Superscalar ProcessorsInstruction Level Parallelism and Superscalar Processors
Instruction Level Parallelism and Superscalar ProcessorsSyed Zaid Irshad
 
Pipeline hazard
Pipeline hazardPipeline hazard
Pipeline hazardAJAL A J
 
Multiprocessor
MultiprocessorMultiprocessor
MultiprocessorNeel Patel
 
DSP Memory Architecture
DSP Memory ArchitectureDSP Memory Architecture
DSP Memory ArchitecturePriyanka Anni
 
Shared-Memory Multiprocessors
Shared-Memory MultiprocessorsShared-Memory Multiprocessors
Shared-Memory MultiprocessorsSalvatore La Bua
 
cloud computing:Types of virtualization
cloud computing:Types of virtualizationcloud computing:Types of virtualization
cloud computing:Types of virtualizationDr.Neeraj Kumar Pandey
 
Instruction Set Architecture
Instruction Set ArchitectureInstruction Set Architecture
Instruction Set ArchitectureJaffer Haadi
 
12 processor structure and function
12 processor structure and function12 processor structure and function
12 processor structure and functionSher Shah Merkhel
 
Operating System-Ch8 memory management
Operating System-Ch8 memory managementOperating System-Ch8 memory management
Operating System-Ch8 memory managementSyaiful Ahdan
 

Tendances (20)

Cache coherence problem and its solutions
Cache coherence problem and its solutionsCache coherence problem and its solutions
Cache coherence problem and its solutions
 
Memory consistency models
Memory consistency modelsMemory consistency models
Memory consistency models
 
Tutorial getting started with RISC-V verification
Tutorial getting started with RISC-V verificationTutorial getting started with RISC-V verification
Tutorial getting started with RISC-V verification
 
RISC - Reduced Instruction Set Computing
RISC - Reduced Instruction Set ComputingRISC - Reduced Instruction Set Computing
RISC - Reduced Instruction Set Computing
 
6.distributed shared memory
6.distributed shared memory6.distributed shared memory
6.distributed shared memory
 
Parallelism
ParallelismParallelism
Parallelism
 
Instruction Level Parallelism and Superscalar Processors
Instruction Level Parallelism and Superscalar ProcessorsInstruction Level Parallelism and Superscalar Processors
Instruction Level Parallelism and Superscalar Processors
 
Pipeline hazard
Pipeline hazardPipeline hazard
Pipeline hazard
 
Cache memory
Cache memoryCache memory
Cache memory
 
Multiprocessor
MultiprocessorMultiprocessor
Multiprocessor
 
DSP Memory Architecture
DSP Memory ArchitectureDSP Memory Architecture
DSP Memory Architecture
 
Multiprocessor system
Multiprocessor system Multiprocessor system
Multiprocessor system
 
Parallel Processing Concepts
Parallel Processing Concepts Parallel Processing Concepts
Parallel Processing Concepts
 
Shared-Memory Multiprocessors
Shared-Memory MultiprocessorsShared-Memory Multiprocessors
Shared-Memory Multiprocessors
 
pipelining
pipeliningpipelining
pipelining
 
cloud computing:Types of virtualization
cloud computing:Types of virtualizationcloud computing:Types of virtualization
cloud computing:Types of virtualization
 
Instruction Set Architecture
Instruction Set ArchitectureInstruction Set Architecture
Instruction Set Architecture
 
12 processor structure and function
12 processor structure and function12 processor structure and function
12 processor structure and function
 
Cache memory
Cache memoryCache memory
Cache memory
 
Operating System-Ch8 memory management
Operating System-Ch8 memory managementOperating System-Ch8 memory management
Operating System-Ch8 memory management
 

En vedette

En vedette (7)

Trace Scheduling
Trace SchedulingTrace Scheduling
Trace Scheduling
 
Os module 2 d
Os module 2 dOs module 2 d
Os module 2 d
 
Vliw
VliwVliw
Vliw
 
6 spatial filtering p2
6 spatial filtering p26 spatial filtering p2
6 spatial filtering p2
 
5 spatial filtering p1
5 spatial filtering p15 spatial filtering p1
5 spatial filtering p1
 
Kerberos
KerberosKerberos
Kerberos
 
Network security
Network securityNetwork security
Network security
 

Similaire à Parallel Processing Techniques

Advanced computer architecture lesson 5 and 6
Advanced computer architecture lesson 5 and 6Advanced computer architecture lesson 5 and 6
Advanced computer architecture lesson 5 and 6Ismail Mukiibi
 
Crussoe proc
Crussoe procCrussoe proc
Crussoe proctyadi
 
Fpga based 128 bit customised vliw processor for executing dual scalarvector ...
Fpga based 128 bit customised vliw processor for executing dual scalarvector ...Fpga based 128 bit customised vliw processor for executing dual scalarvector ...
Fpga based 128 bit customised vliw processor for executing dual scalarvector ...eSAT Publishing House
 
The sunsparc architecture
The sunsparc architectureThe sunsparc architecture
The sunsparc architectureTaha Malampatti
 
Parallel Computing
Parallel ComputingParallel Computing
Parallel ComputingMohsin Bhat
 
SOC System Design Approach
SOC System Design ApproachSOC System Design Approach
SOC System Design ApproachA B Shinde
 
Advanced processor principles
Advanced processor principlesAdvanced processor principles
Advanced processor principlesDhaval Bagal
 
5-Embedded processor technology-06-01-2024.pdf
5-Embedded processor technology-06-01-2024.pdf5-Embedded processor technology-06-01-2024.pdf
5-Embedded processor technology-06-01-2024.pdfmovocode
 
Advanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPAdvanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPA B Shinde
 
DPDK Integration: A Product's Journey - Roger B. Melton
DPDK Integration: A Product's Journey - Roger B. MeltonDPDK Integration: A Product's Journey - Roger B. Melton
DPDK Integration: A Product's Journey - Roger B. Meltonharryvanhaaren
 
Computer Organization.pptx
Computer Organization.pptxComputer Organization.pptx
Computer Organization.pptxsaimagul310
 
FIne Grain Multithreading
FIne Grain MultithreadingFIne Grain Multithreading
FIne Grain MultithreadingDharmesh Tank
 
Pragmatic optimization in modern programming - modern computer architecture c...
Pragmatic optimization in modern programming - modern computer architecture c...Pragmatic optimization in modern programming - modern computer architecture c...
Pragmatic optimization in modern programming - modern computer architecture c...Marina Kolpakova
 
Linux Assignment 3
Linux Assignment 3Linux Assignment 3
Linux Assignment 3Diane Allen
 

Similaire à Parallel Processing Techniques (20)

Difficulties in Pipelining
Difficulties in PipeliningDifficulties in Pipelining
Difficulties in Pipelining
 
Advanced computer architecture lesson 5 and 6
Advanced computer architecture lesson 5 and 6Advanced computer architecture lesson 5 and 6
Advanced computer architecture lesson 5 and 6
 
Crussoe proc
Crussoe procCrussoe proc
Crussoe proc
 
Fpga based 128 bit customised vliw processor for executing dual scalarvector ...
Fpga based 128 bit customised vliw processor for executing dual scalarvector ...Fpga based 128 bit customised vliw processor for executing dual scalarvector ...
Fpga based 128 bit customised vliw processor for executing dual scalarvector ...
 
The sunsparc architecture
The sunsparc architectureThe sunsparc architecture
The sunsparc architecture
 
Vliw or epic
Vliw or epicVliw or epic
Vliw or epic
 
Parallel Computing
Parallel ComputingParallel Computing
Parallel Computing
 
SOC System Design Approach
SOC System Design ApproachSOC System Design Approach
SOC System Design Approach
 
1.My Presentation.pptx
1.My Presentation.pptx1.My Presentation.pptx
1.My Presentation.pptx
 
Advanced processor principles
Advanced processor principlesAdvanced processor principles
Advanced processor principles
 
5-Embedded processor technology-06-01-2024.pdf
5-Embedded processor technology-06-01-2024.pdf5-Embedded processor technology-06-01-2024.pdf
5-Embedded processor technology-06-01-2024.pdf
 
Advanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPAdvanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILP
 
W04505116121
W04505116121W04505116121
W04505116121
 
DPDK Integration: A Product's Journey - Roger B. Melton
DPDK Integration: A Product's Journey - Roger B. MeltonDPDK Integration: A Product's Journey - Roger B. Melton
DPDK Integration: A Product's Journey - Roger B. Melton
 
Computer Organization.pptx
Computer Organization.pptxComputer Organization.pptx
Computer Organization.pptx
 
Tutor1
Tutor1Tutor1
Tutor1
 
FIne Grain Multithreading
FIne Grain MultithreadingFIne Grain Multithreading
FIne Grain Multithreading
 
Pragmatic optimization in modern programming - modern computer architecture c...
Pragmatic optimization in modern programming - modern computer architecture c...Pragmatic optimization in modern programming - modern computer architecture c...
Pragmatic optimization in modern programming - modern computer architecture c...
 
Linux Assignment 3
Linux Assignment 3Linux Assignment 3
Linux Assignment 3
 
Unit I_MT2301.pdf
Unit I_MT2301.pdfUnit I_MT2301.pdf
Unit I_MT2301.pdf
 

Plus de Gichelle Amon (19)

Os module 2 c
Os module 2 cOs module 2 c
Os module 2 c
 
Image segmentation ppt
Image segmentation pptImage segmentation ppt
Image segmentation ppt
 
Lec3 final
Lec3 finalLec3 final
Lec3 final
 
Lec 3
Lec 3Lec 3
Lec 3
 
Lec2 final
Lec2 finalLec2 final
Lec2 final
 
Lec 4
Lec 4Lec 4
Lec 4
 
Module 3 law of contracts
Module 3  law of contractsModule 3  law of contracts
Module 3 law of contracts
 
Transport triggered architecture
Transport triggered architectureTransport triggered architecture
Transport triggered architecture
 
Time triggered arch.
Time triggered arch.Time triggered arch.
Time triggered arch.
 
Subnetting
SubnettingSubnetting
Subnetting
 
Os module 2 c
Os module 2 cOs module 2 c
Os module 2 c
 
Os module 2 ba
Os module 2 baOs module 2 ba
Os module 2 ba
 
Lec5
Lec5Lec5
Lec5
 
Delivery
DeliveryDelivery
Delivery
 
Addressing
AddressingAddressing
Addressing
 
Medical image analysis
Medical image analysisMedical image analysis
Medical image analysis
 
Presentation2
Presentation2Presentation2
Presentation2
 
Harvard architecture
Harvard architectureHarvard architecture
Harvard architecture
 
Micro channel architecture
Micro channel architectureMicro channel architecture
Micro channel architecture
 

Dernier

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Dernier (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Parallel Processing Techniques

  • 1. Superscalar and VLIW Architectures
  • 2. Parallel processing [2] Processing instructions in parallel requires three major tasks: 2. checking dependencies between instructions to determine which instructions can be grouped together for parallel execution; 3. assigning instructions to the functional units on the hardware; 4. determining when instructions are initiated placed together into a single word.
  • 3. Major categories [2] VLIW – Very Long Instruction Word EPIC – Explicitly Parallel Instruction Computing
  • 5. Superscalar Processors [1]  Superscalar processors are designed to exploit more instruction-level parallelism in user programs.  Only independent instructions can be executed in parallel without causing a wait state.  The amount of instruction-level parallelism varies widely depending on the type of code being executed.
  • 6. Pipelining in Superscalar Processors [1]  In order to fully utilise a superscalar processor of degree m, m instructions must be executable in parallel. This situation may not be true in all clock cycles. In that case, some of the pipelines may be stalling in a wait state.  In a superscalar processor, the simple operation latency should require only one cycle, as in the base scalar processor.
  • 7.
  • 9. Superscalar Implementation  Simultaneously fetch multiple instructions  Logic to determine true dependencies involving register values  Mechanisms to communicate these values  Mechanisms to initiate multiple instructions in parallel  Resources for parallel execution of multiple instructions  Mechanisms for committing process state in correct order
  • 10. Some Architectures  PowerPC 604 – six independent execution units:  Branch execution unit  Load/Store unit  3 Integer units  Floating-point unit – in-order issue – register renaming  Power PC 620 – provides in addition to the 604 out-of-order issue  Pentium – three independent execution units:  2 Integer units  Floating point unit – in-order issue
  • 11. VLIW  Very Long Instruction Word (VLIW) architectures are used for executing more than one basic instruction at a time.  These processors contain multiple functional units, which fetch from the instruction cache a Very-Long Instruction Word containing several basic instructions, and dispatch the entire VLIW for parallel execution. These capabilities are exploited by compilers which generate code that has grouped together independent primitive instructions executable in parallel.  VLIW has been described as a natural successor to RISC (Reduced Instruction Set Computing), because it moves complexity from the hardware to the compiler, allowing simpler, faster processors.  VLIW eliminates the complicated instruction scheduling and parallel dispatch that occurs in most modern microprocessors.
  • 12. WHY VLIW ? The key to higher performance in microprocessors for a broad range of applications is the ability to exploit fine-grain, instruction-level parallelism. Some methods for exploiting fine-grain parallelism include:  Pipelining  Multiple processors  Superscalar implementation  Specifying multiple independent operations per instruction
  • 13. Architecture Comparison: CISC, RISC & VLIW ARCHITECTURE CISC RISC VLIW CHARACTERISTIC INSTRUCTION SIZE Varies One size, usually 32 bits One size INSTRUCTION Field placement varies Regular, consistent Regular, consistent FORMAT placement of fields placement of Fields INSTRUCTION Varies from simple to Almost always one Many simple, SEMANTICS complex ; possibly many simple operation independent dependent operations operations per instruction REGISTERS Few, sometimes special Many, general-purpose Many, general-purpose
  • 14. Architecture Comparison: CISC, RISC & VLIW ARCHITECTURE CISC RISC VLIW CHARACTERISTIC MEMORY REFERENCES Bundled with operations Not bundled with Not bundled with in many different types operations, operations,i.e., of instructions i.e.,load/store load/store architecture architecture HARDWARE DESIGN Exploit micro coded Exploit Exploit FOCUS implementations implementations Implementations with one pipeline and & With multiple pipelines, no microcode no microcode & no complex dispatch logic PICTURES OF FIVE TYPICAL INSTRUCTIONS
  • 15. Advantages of VLIW  VLIW processors rely on the compiler that generates the VLIW code to explicitly specify parallelism. Relying on the compiler has advantages.  VLIW architecture reduces hardware complexity. VLIW simply moves complexity from hardware into software.
  • 16. What is ILP ?  Instruction-level parallelism (ILP) is a measure of how many of the operations in a computer program can be performed simultaneously.  A system is said to embody ILP (instruction-level parallelism) is multiple instructions runs on them at the same time.  ILP can have a significant effect on performance which is critical to embedded systems.  ILP provides an form of power saving by slowing the clock.
  • 17. What we intend to do with ILP ? We use Micro-architectural techniques to exploit the ILP. The various techniques include :  Instruction pipelining which depend on CPU caches.  Register renaming which refers to a technique used to avoid unnecessary. serialization of program operations imposed by the reuse of registers by those operations.  Speculative execution which reduce pipeline stalls due to control dependencies.  Branch prediction which is used to keep the pipeline full.  Superscalar execution in which multiple execution units are used to execute multiple instructions in parallel.  Out of Order execution which reduces pipeline stall due to operand dependencies.
  • 18. Algorithms for scheduling Few of the Instruction scheduling algorithms used are :  List scheduling  Trace scheduling  Software pipelining (modulo scheduling)
  • 19. List Scheduling List scheduling by steps : 2. Construct a dependence graph of the basic block. (The edges are weighted with the latency of the instruction). 3. Use the dependence graph to determine instructions that can execute; insert on a list, called the Readylist. 4. Use the dependence graph and the Ready list to schedule an instruction that causes the smallest possible stall; update the Ready list. Repeat
  • 20. Code Representation for List Scheduling a=b+c d=e - f 1 2 5 6 3 7 1. load R1, b 2. load R2, c 4 8 3. add R2,R1 4. store a, R2 5. load R3, e 6. load R4,f 7. sub R3,R4 8. store d,R3
  • 21. Code Representation for List Scheduling 1. load R1, b 1. load R1, b 1 2 5 6 2. load R2, c 5.load R3, e 3. add R2,R1 2. load R2, c 3 7 4. store a, R2 6.load R4, f 5. load R3, e 3.add R2,R1 6. load R4,f 7.sub R3,R4 4 8 7. sub R3,R4 4.store a, R2 8. store d,R3 8. store d, R3 a=b+c d=e - f Now we have a schedule that requires no stalls and no NOPs.
  • 22. Problem and Solution  Register allocation conflict : use of same register creates anti-Dependencies that restrict scheduling  Register allocation before scheduling –prevents good scheduling  Scheduling before register allocation –spills destroy scheduling  Solution : Schedule abstract assembly, Allocate registers, Schedule
  • 23. Trace scheduling Steps involved in Trace Scheduling :  Trace Selection – Find the most common trace of basic blocks.  Trace Compaction –Combine the basic blocks in the trace and schedule them as one block –Create clean-up code if the execution goes off-trace  Parallelism across IF branches vs. LOOP branches  Can provide a speedup if static prediction is accurate
  • 24. How Trace Scheduling works Look for higher priority and trace the blocks as shown below.
  • 25. How Trace Scheduling works After tracing the priority blocks you schedule it first and rest parallel to that .
  • 26. How Trace Scheduling works We can see the blocks been traced depending on the priority.
  • 27. How Trace Scheduling works • Creating large extended basic blocks by duplication • Schedule the larger blocks Figure above shows how the extended basic blocks can be created.
  • 28. How Trace Scheduling works This block diagram in its final stage shows you the parallelism across the branches.
  • 29. Limitations of Trace Scheduling  Optimizations depends on the traces being the dominant paths in the program’s control-flow.  Therefore, the following two things should be true: –Programs should demonstrate the behavior of being skewed in the branches taken at run-time, for typical mixes of input data. –We should have access to this information at compile time. Not so easy.
  • 30. Software Pipelining  In software pipelining, iterations of a loop in the source program are continuously initiated at constant intervals, before the preceding iterations complete thus taking advantage of the parallelism in data path.  Its also explained as scheduling the operations within an iteration, such that the iterations can be pipelined to yield optimal throughput.  The sequence of instructions before the steady state are called PROLOG and the ones that are in the sequence after the steady state is called EPILOG.
  • 31. Software Pipelining Example •Source code: for(i=0;i<n;i++) sum += a[i] r7 = L r6 ---;stall •Loop body in assembly: r2 = Add r2,r7 r1 = L r0 ---;stall r6 = add r6,12 r2 = Addr2,r1 r0 = addr0,4 r10 = L r9 ---;stall •Unroll loop & allocate registers r2 = Add r2,r10 r1 = L r0 ---;stall r9 = add r9,12 r2 = Add r2,r1 r0 = Add r0,12 r4 = L r3 ---;stall r2 = Add r2,r4 r3 = add r3,12
  • 33. Software Pipelining Example Schedule Unrolled Instructions, exploiting VLIW (or not) PROLOG Identify Repeating Pattern (Kernel) EPILOG
  • 34. Constraints in Software pipelining  Recurrence Constraints: which is determined by loop carried data dependencies.  Resource Constraints: which is determined by total resource requirements.
  • 35. Remarks on Software Pipelining  Innermost loop, loops with larger trip count, loops without conditionals can be software pipelined.  Code size increase due to prolog and epilog.  Code size increase due to unrolling for MVE (Modulo Variable Expansion).  Register allocation strategies for software pipelined loops .  Loops with conditional can be software pipelined if predicated execution is supported. –Higher resource requirement, but efficient schedule