SlideShare une entreprise Scribd logo
1  sur  20
Instruction Scheduling
Sourabh Kumar
ECE
Pipeline…
A pipeline is the mechanism a RISC processor uses to execute instructions.
Using a pipeline speeds up execution by fetching the next instruction while
other instructions are being decoded and executed.
The ARM pipeline has not processed an instruction until it passes completely
through the execute stage.
Three Stage Pipeline Five Stage Pipeline
Pipeline Execution…
Fetch Decode ALU LS1 LS2
pc Pc-4 Pc-8 pc-12 Pc-16
Action
Instruction Address
Fetch: get the instruction from memory into core.
Decode: decodes the previous instruction and reads the data operands from
bank registers or forwarding paths.
ALU: executed the previously executed instruction.
LS1: load or store instruction gets executed like LDR.
LS2: LDRB and STRB or LDRH and STRH gets executed.
Execution Cycles…
The following rules summarize the cycle timings for common instruction:
The time taken to execute instructions depends on the implementation
pipeline.
Instructions that are conditional on the value of the ARM condition codes
in the CPSR take one cycle if the condition is not met.
If the condition is met, then the following rules apply given on next slide:
Execution Cycles…
ALU operations such as addition, subtraction, and logical operations, shift by an
immediate value takes One cycle.
Register-specified shift takes Two cycles.
Load instructions such as LDR and LDM take N cycles.
Load instructions that load 16 or 8-bit data such as LDRB, take One cycle.
Branch instructions take Three cycles.
Store instructions that store N values take N cycles. An STM of a single value is
exceptional, taking Two cycles.
Multiply instructions take a varying number of cycles depending on the value of the
second operand in the product.
Pipeline Interlock…
If an instruction requires the result of a previous instruction that is not available,
then the processor stalls. This is called a pipeline hazard or pipeline Interlock.
Example: where there is No Interlock.
ADD r0, r0, r1
ADD r0, r0, r2
This instruction pair takes Two cycles.
The ALU calculates r0 + r1 in one cycle. Therefore this result is available for the
ALU to calculate r0 + r2 in the second cycle.
Pipeline Interlock…
Example: where there is one-cycle interlock.
LDR r1, [r2, #4]
ADD r0, r0, r1
This instruction pair takes Three cycles.
The ALU calculates the address r2 + 4 in the first cycle while decoding the
ADD instruction in parallel.
The ADD cannot proceed on the second cycle because the load instruction
has not yet loaded the value of r1.
Pipeline Interlock…
Pipeline Fetch Decode ALU LS1 LS2
Cycle 1 … ADD LDR …
Cycle 2 … ADD LDR …
Cycle 3 … ADD ---- LDR
The processor executes the ADD in the ALU on the third cycle.
 The processor stalls the ADD instruction for one cycle in the ALU stage of The
pipeline while the load instruction completes the LS1 stage.
 The LDR instruction proceeds down the pipeline, but the add instruction is stalled, a
gap opens up between them. This gap is sometimes called a pipeline bubble.
Why a branch instruction takes
three cycles???
MOV r1, #1
B case1
AND r0, r0, r1
EOR r2, r2, r3
...
case1
SUB r0, r0, r1
The MOV instruction executes on the first
cycle.
On the second cycle, the branch instruction
calculates the destination address.
This causes the core to flush the pipeline
and refill it using this new pc value.
The refill takes two cycles. Finally, the SUB
instruction executes normally.
Why a branch instruction takes
three cycles???
Pipeline Fetch Decode ALU LS1 LS2
Cycle 1 AND B MOV …
Cycle 2 EOR AND B MOV …
Cycle 3 SUB ---- ---- B MOV
Cycle 4 … SUB ---- ---- B
Cycle 5 … SUB ---- ----
 The pipeline drops the two instructions following the branch when the
branch takes place.
Scheduling of load instructions…
Load instructions occur frequently in compiled code, accounting for
approximately one third of all instructions.
Careful scheduling of load instructions so that pipeline stalls don’t occur can
improve performance.
The compiler attempts to schedule the code as best it can, but the aliasing
problems of C limits the available optimizations.
The compiler cannot move a load instruction before a store instruction
unless it is certain that the two pointers used do not point to the same
address.
C PROGRAM:
void str_tolower(char *out, char *in)
{
unsigned int c;
do
{
c = *(in++);
if (c>=’A’ && c<=’Z’)
{
c = c + (’a’ -’A’);
}
*(out++) = (char)c;
}
while (c);
}
Scheduling of load
instructions…
A B C D /0
65 66 67 68
*in
a b c d /0
97 98 99 100
*out
Scheduling of load instructions…
str_tolower
LDRB r2,[r1],#1 ; c = *(in++)
SUB r3,r2,#0x41 ; r3 = c -‘A’
CMP r3,#0x19 ; if (c <=‘Z’-‘A’)
ADDLS r2,r2,#0x20 ; c +=‘a’-‘A’
STRB r2,[r0],#1 ; *(out++) = (char)c
CMP r2,#0 ; if (c!=0)
BNE str_tolower ; goto str_tolower
MOV pc,r14 ; return
The SUB instruction uses the value of c
directly after the LDRB instruction that
loads c.
The ARM pipeline will stall for two cycles.
Overall there are 11 execution cycles which
are needed to execute the whole program.
 There are two ways you can alter the
structure of the algorithm to avoid the
cycles by using assembly -- load
scheduling by preloading and unrolling.
Load scheduling by preloading…
Preloading: loads the data required for the loop at the end of the previous
loop, rather than at the beginning of the current loop.
Since loop i is loading data for loop i+1, there is always a problem with the
first and last loops. For the first loop, insert an extra load outside the loop.
For the last loop, be careful not to read any data. This can be effectively done
by conditional execution.
This reduces the loop from 11 cycles per character to 9 cycles per character
on an ARM, giving a 1.22 times speed improvement.
Load scheduling by preloading…
Load scheduling by unrolling…
Unroll and interleave the body of the loop.
For example, we can perform three loops together.
When the result of an operation from loop i is not ready, we can perform an
operation from loop i+1 that avoids waiting for the loop i result.
This loop is the most efficient implementation we’ve looked at so far. The
implementation requires seven cycles per character on ARM. This gives a
1.57 times speed increase over the original str_tolower.
Load scheduling by unrolling…
Load scheduling by unrolling…
Load scheduling by unrolling…
 More than doubling the code size
 Only efficient for a large data size
Instruction scheduling

Contenu connexe

Tendances

Final draft intel core i5 processors architecture
Final draft intel core i5 processors architectureFinal draft intel core i5 processors architecture
Final draft intel core i5 processors architectureJawid Ahmad Baktash
 
Static and Dynamic Read/Write memories
Static and Dynamic Read/Write memoriesStatic and Dynamic Read/Write memories
Static and Dynamic Read/Write memoriesAbhilash Nair
 
ARM architcture
ARM architcture ARM architcture
ARM architcture Hossam Adel
 
Physical Design of IoT.pdf
Physical Design of IoT.pdfPhysical Design of IoT.pdf
Physical Design of IoT.pdfJoshuaKimmich1
 
Technology Introduction Series: Edge Computing tutorial.pdf
Technology Introduction Series: Edge Computing tutorial.pdfTechnology Introduction Series: Edge Computing tutorial.pdf
Technology Introduction Series: Edge Computing tutorial.pdf3G4G
 
Single instruction multiple data
Single instruction multiple dataSingle instruction multiple data
Single instruction multiple dataSyed Zaid Irshad
 
IoT Communication Protocols
IoT Communication ProtocolsIoT Communication Protocols
IoT Communication ProtocolsPradeep Kumar TS
 
RPL - Routing Protocol for Low Power and Lossy Networks
RPL - Routing Protocol for Low Power and Lossy NetworksRPL - Routing Protocol for Low Power and Lossy Networks
RPL - Routing Protocol for Low Power and Lossy NetworksPradeep Kumar TS
 
Asynchronous transfer mode (atm) in computer network
Asynchronous transfer mode (atm) in computer networkAsynchronous transfer mode (atm) in computer network
Asynchronous transfer mode (atm) in computer networkMH Shihab
 
load balancing in public cloud ppt
load balancing in public cloud pptload balancing in public cloud ppt
load balancing in public cloud pptKrishna Kumar
 
Mobile Network Layer
Mobile Network LayerMobile Network Layer
Mobile Network LayerRahul Hada
 

Tendances (20)

Tcp
TcpTcp
Tcp
 
Final draft intel core i5 processors architecture
Final draft intel core i5 processors architectureFinal draft intel core i5 processors architecture
Final draft intel core i5 processors architecture
 
Wireless Sensor Networks ppt
Wireless Sensor Networks pptWireless Sensor Networks ppt
Wireless Sensor Networks ppt
 
RFID with INTERNET OF THINGS
RFID with INTERNET OF THINGSRFID with INTERNET OF THINGS
RFID with INTERNET OF THINGS
 
Static and Dynamic Read/Write memories
Static and Dynamic Read/Write memoriesStatic and Dynamic Read/Write memories
Static and Dynamic Read/Write memories
 
IP addressing
IP addressingIP addressing
IP addressing
 
ARM architcture
ARM architcture ARM architcture
ARM architcture
 
Distributed System ppt
Distributed System pptDistributed System ppt
Distributed System ppt
 
Fog computing in IoT
Fog computing in IoTFog computing in IoT
Fog computing in IoT
 
Unit vi (1)
Unit vi (1)Unit vi (1)
Unit vi (1)
 
Physical Design of IoT.pdf
Physical Design of IoT.pdfPhysical Design of IoT.pdf
Physical Design of IoT.pdf
 
Technology Introduction Series: Edge Computing tutorial.pdf
Technology Introduction Series: Edge Computing tutorial.pdfTechnology Introduction Series: Edge Computing tutorial.pdf
Technology Introduction Series: Edge Computing tutorial.pdf
 
Single instruction multiple data
Single instruction multiple dataSingle instruction multiple data
Single instruction multiple data
 
IoT Communication Protocols
IoT Communication ProtocolsIoT Communication Protocols
IoT Communication Protocols
 
RPL - Routing Protocol for Low Power and Lossy Networks
RPL - Routing Protocol for Low Power and Lossy NetworksRPL - Routing Protocol for Low Power and Lossy Networks
RPL - Routing Protocol for Low Power and Lossy Networks
 
Intel Core i7 Processors
Intel Core i7 ProcessorsIntel Core i7 Processors
Intel Core i7 Processors
 
Asynchronous transfer mode (atm) in computer network
Asynchronous transfer mode (atm) in computer networkAsynchronous transfer mode (atm) in computer network
Asynchronous transfer mode (atm) in computer network
 
load balancing in public cloud ppt
load balancing in public cloud pptload balancing in public cloud ppt
load balancing in public cloud ppt
 
Module 1 8086
Module 1 8086Module 1 8086
Module 1 8086
 
Mobile Network Layer
Mobile Network LayerMobile Network Layer
Mobile Network Layer
 

Similaire à Instruction scheduling

18CS44-MES-Module-2(Chapter 6).pptx
18CS44-MES-Module-2(Chapter 6).pptx18CS44-MES-Module-2(Chapter 6).pptx
18CS44-MES-Module-2(Chapter 6).pptxrakshitha481121
 
A Lightweight Instruction Scheduling Algorithm For Just In Time Compiler
A Lightweight Instruction Scheduling Algorithm For Just In Time CompilerA Lightweight Instruction Scheduling Algorithm For Just In Time Compiler
A Lightweight Instruction Scheduling Algorithm For Just In Time Compilerkeanumit
 
pipelining ppt.pdf
pipelining ppt.pdfpipelining ppt.pdf
pipelining ppt.pdfWilliamTom9
 
Pipelining in Computer System Achitecture
Pipelining in Computer System AchitecturePipelining in Computer System Achitecture
Pipelining in Computer System AchitectureYashiUpadhyay3
 
Advanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPAdvanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPA B Shinde
 
Pipelining of Processors Computer Architecture
Pipelining of  Processors Computer ArchitecturePipelining of  Processors Computer Architecture
Pipelining of Processors Computer ArchitectureHaris456
 
High Performance Computer Architecture
High Performance Computer ArchitectureHigh Performance Computer Architecture
High Performance Computer ArchitectureSubhasis Dash
 
Computer organisation and architecture .
Computer organisation and architecture .Computer organisation and architecture .
Computer organisation and architecture .MalligaarjunanN
 
Pipelining 16 computers Artitacher pdf
Pipelining   16 computers Artitacher  pdfPipelining   16 computers Artitacher  pdf
Pipelining 16 computers Artitacher pdfMadhuGupta99385
 
Pipeline and data hazard
Pipeline and data hazardPipeline and data hazard
Pipeline and data hazardWaed Shagareen
 
Topic2a ss pipelines
Topic2a ss pipelinesTopic2a ss pipelines
Topic2a ss pipelinesturki_09
 
Unit 2 contd. and( unit 3 voice over ppt)
Unit 2 contd. and( unit 3   voice over ppt)Unit 2 contd. and( unit 3   voice over ppt)
Unit 2 contd. and( unit 3 voice over ppt)Dr Reeja S R
 
Instruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler TechniquesInstruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler TechniquesDilum Bandara
 

Similaire à Instruction scheduling (20)

IS.pptx
IS.pptxIS.pptx
IS.pptx
 
18CS44-MES-Module-2(Chapter 6).pptx
18CS44-MES-Module-2(Chapter 6).pptx18CS44-MES-Module-2(Chapter 6).pptx
18CS44-MES-Module-2(Chapter 6).pptx
 
A Lightweight Instruction Scheduling Algorithm For Just In Time Compiler
A Lightweight Instruction Scheduling Algorithm For Just In Time CompilerA Lightweight Instruction Scheduling Algorithm For Just In Time Compiler
A Lightweight Instruction Scheduling Algorithm For Just In Time Compiler
 
pipelining ppt.pdf
pipelining ppt.pdfpipelining ppt.pdf
pipelining ppt.pdf
 
Assembly p1
Assembly p1Assembly p1
Assembly p1
 
mod 3-1.pptx
mod 3-1.pptxmod 3-1.pptx
mod 3-1.pptx
 
Pipelining in Computer System Achitecture
Pipelining in Computer System AchitecturePipelining in Computer System Achitecture
Pipelining in Computer System Achitecture
 
3 Pipelining
3 Pipelining3 Pipelining
3 Pipelining
 
Advanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPAdvanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILP
 
Pipelining of Processors Computer Architecture
Pipelining of  Processors Computer ArchitecturePipelining of  Processors Computer Architecture
Pipelining of Processors Computer Architecture
 
High Performance Computer Architecture
High Performance Computer ArchitectureHigh Performance Computer Architecture
High Performance Computer Architecture
 
Computer organisation and architecture .
Computer organisation and architecture .Computer organisation and architecture .
Computer organisation and architecture .
 
Pipelining 16 computers Artitacher pdf
Pipelining   16 computers Artitacher  pdfPipelining   16 computers Artitacher  pdf
Pipelining 16 computers Artitacher pdf
 
Pipelining & All Hazards Solution
Pipelining  & All Hazards SolutionPipelining  & All Hazards Solution
Pipelining & All Hazards Solution
 
Pipelining slides
Pipelining slides Pipelining slides
Pipelining slides
 
Coa.ppt2
Coa.ppt2Coa.ppt2
Coa.ppt2
 
Pipeline and data hazard
Pipeline and data hazardPipeline and data hazard
Pipeline and data hazard
 
Topic2a ss pipelines
Topic2a ss pipelinesTopic2a ss pipelines
Topic2a ss pipelines
 
Unit 2 contd. and( unit 3 voice over ppt)
Unit 2 contd. and( unit 3   voice over ppt)Unit 2 contd. and( unit 3   voice over ppt)
Unit 2 contd. and( unit 3 voice over ppt)
 
Instruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler TechniquesInstruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler Techniques
 

Plus de SOURABH KUMAR

How to give feedback
How to give feedbackHow to give feedback
How to give feedbackSOURABH KUMAR
 
What to do in interview
What to do in interviewWhat to do in interview
What to do in interviewSOURABH KUMAR
 
Common questions for Interview
Common questions for InterviewCommon questions for Interview
Common questions for InterviewSOURABH KUMAR
 
Human resources management and planning
Human resources management and planningHuman resources management and planning
Human resources management and planningSOURABH KUMAR
 

Plus de SOURABH KUMAR (8)

How to give feedback
How to give feedbackHow to give feedback
How to give feedback
 
What to do in interview
What to do in interviewWhat to do in interview
What to do in interview
 
Common questions for Interview
Common questions for InterviewCommon questions for Interview
Common questions for Interview
 
Job design
Job designJob design
Job design
 
Job Analysis
Job AnalysisJob Analysis
Job Analysis
 
Human resources management and planning
Human resources management and planningHuman resources management and planning
Human resources management and planning
 
Rf module
Rf moduleRf module
Rf module
 
Ip packet delivery
Ip packet deliveryIp packet delivery
Ip packet delivery
 

Dernier

(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesPrabhanshu Chaturvedi
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 

Dernier (20)

(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 

Instruction scheduling

  • 2. Pipeline… A pipeline is the mechanism a RISC processor uses to execute instructions. Using a pipeline speeds up execution by fetching the next instruction while other instructions are being decoded and executed. The ARM pipeline has not processed an instruction until it passes completely through the execute stage. Three Stage Pipeline Five Stage Pipeline
  • 3. Pipeline Execution… Fetch Decode ALU LS1 LS2 pc Pc-4 Pc-8 pc-12 Pc-16 Action Instruction Address Fetch: get the instruction from memory into core. Decode: decodes the previous instruction and reads the data operands from bank registers or forwarding paths. ALU: executed the previously executed instruction. LS1: load or store instruction gets executed like LDR. LS2: LDRB and STRB or LDRH and STRH gets executed.
  • 4. Execution Cycles… The following rules summarize the cycle timings for common instruction: The time taken to execute instructions depends on the implementation pipeline. Instructions that are conditional on the value of the ARM condition codes in the CPSR take one cycle if the condition is not met. If the condition is met, then the following rules apply given on next slide:
  • 5. Execution Cycles… ALU operations such as addition, subtraction, and logical operations, shift by an immediate value takes One cycle. Register-specified shift takes Two cycles. Load instructions such as LDR and LDM take N cycles. Load instructions that load 16 or 8-bit data such as LDRB, take One cycle. Branch instructions take Three cycles. Store instructions that store N values take N cycles. An STM of a single value is exceptional, taking Two cycles. Multiply instructions take a varying number of cycles depending on the value of the second operand in the product.
  • 6. Pipeline Interlock… If an instruction requires the result of a previous instruction that is not available, then the processor stalls. This is called a pipeline hazard or pipeline Interlock. Example: where there is No Interlock. ADD r0, r0, r1 ADD r0, r0, r2 This instruction pair takes Two cycles. The ALU calculates r0 + r1 in one cycle. Therefore this result is available for the ALU to calculate r0 + r2 in the second cycle.
  • 7. Pipeline Interlock… Example: where there is one-cycle interlock. LDR r1, [r2, #4] ADD r0, r0, r1 This instruction pair takes Three cycles. The ALU calculates the address r2 + 4 in the first cycle while decoding the ADD instruction in parallel. The ADD cannot proceed on the second cycle because the load instruction has not yet loaded the value of r1.
  • 8. Pipeline Interlock… Pipeline Fetch Decode ALU LS1 LS2 Cycle 1 … ADD LDR … Cycle 2 … ADD LDR … Cycle 3 … ADD ---- LDR The processor executes the ADD in the ALU on the third cycle.  The processor stalls the ADD instruction for one cycle in the ALU stage of The pipeline while the load instruction completes the LS1 stage.  The LDR instruction proceeds down the pipeline, but the add instruction is stalled, a gap opens up between them. This gap is sometimes called a pipeline bubble.
  • 9. Why a branch instruction takes three cycles??? MOV r1, #1 B case1 AND r0, r0, r1 EOR r2, r2, r3 ... case1 SUB r0, r0, r1 The MOV instruction executes on the first cycle. On the second cycle, the branch instruction calculates the destination address. This causes the core to flush the pipeline and refill it using this new pc value. The refill takes two cycles. Finally, the SUB instruction executes normally.
  • 10. Why a branch instruction takes three cycles??? Pipeline Fetch Decode ALU LS1 LS2 Cycle 1 AND B MOV … Cycle 2 EOR AND B MOV … Cycle 3 SUB ---- ---- B MOV Cycle 4 … SUB ---- ---- B Cycle 5 … SUB ---- ----  The pipeline drops the two instructions following the branch when the branch takes place.
  • 11. Scheduling of load instructions… Load instructions occur frequently in compiled code, accounting for approximately one third of all instructions. Careful scheduling of load instructions so that pipeline stalls don’t occur can improve performance. The compiler attempts to schedule the code as best it can, but the aliasing problems of C limits the available optimizations. The compiler cannot move a load instruction before a store instruction unless it is certain that the two pointers used do not point to the same address.
  • 12. C PROGRAM: void str_tolower(char *out, char *in) { unsigned int c; do { c = *(in++); if (c>=’A’ && c<=’Z’) { c = c + (’a’ -’A’); } *(out++) = (char)c; } while (c); } Scheduling of load instructions… A B C D /0 65 66 67 68 *in a b c d /0 97 98 99 100 *out
  • 13. Scheduling of load instructions… str_tolower LDRB r2,[r1],#1 ; c = *(in++) SUB r3,r2,#0x41 ; r3 = c -‘A’ CMP r3,#0x19 ; if (c <=‘Z’-‘A’) ADDLS r2,r2,#0x20 ; c +=‘a’-‘A’ STRB r2,[r0],#1 ; *(out++) = (char)c CMP r2,#0 ; if (c!=0) BNE str_tolower ; goto str_tolower MOV pc,r14 ; return The SUB instruction uses the value of c directly after the LDRB instruction that loads c. The ARM pipeline will stall for two cycles. Overall there are 11 execution cycles which are needed to execute the whole program.  There are two ways you can alter the structure of the algorithm to avoid the cycles by using assembly -- load scheduling by preloading and unrolling.
  • 14. Load scheduling by preloading… Preloading: loads the data required for the loop at the end of the previous loop, rather than at the beginning of the current loop. Since loop i is loading data for loop i+1, there is always a problem with the first and last loops. For the first loop, insert an extra load outside the loop. For the last loop, be careful not to read any data. This can be effectively done by conditional execution. This reduces the loop from 11 cycles per character to 9 cycles per character on an ARM, giving a 1.22 times speed improvement.
  • 15. Load scheduling by preloading…
  • 16. Load scheduling by unrolling… Unroll and interleave the body of the loop. For example, we can perform three loops together. When the result of an operation from loop i is not ready, we can perform an operation from loop i+1 that avoids waiting for the loop i result. This loop is the most efficient implementation we’ve looked at so far. The implementation requires seven cycles per character on ARM. This gives a 1.57 times speed increase over the original str_tolower.
  • 17. Load scheduling by unrolling…
  • 18. Load scheduling by unrolling…
  • 19. Load scheduling by unrolling…  More than doubling the code size  Only efficient for a large data size