SlideShare une entreprise Scribd logo
1  sur  62
Télécharger pour lire hors ligne
Understanding the Tomasulo Algorithm
Yichao Cheng
Jul 23, 2013
Background
 IBM System/360 Model 91
 FPU’s add/mul/div takes 2/3/13 cycles
 Can performance be improved through utilizing
multiple execution units?
Adder
Mul
div
Major Contributions
Proposed three innovative mechanisms:
 Common data busing(CDB)
 Register tagging scheme
 Reservation station
which permits:
 Out-of-order execution of independent instructions
 while preserving the essential precedences in the
instruction stream
Doubt
 When people talk about Tomasolu algorithm, they
talk about register renaming
 However this word can’t be found in the original
paper
How could anyone invent a thing
without noticing it?
Architecture Overview
FLOS
Adder
Mul
div
FLB
SDB
FLR Decoder
Storage
Instruction
Unit
FPU
From a FPU’s perspective
All instructions are ‘register-to-register’
 Register-to-register arithmetic
 Storage-to-register arithmetic
 Load
 Store
Instruction Unit(outside FPU) is in charge of the
address generation and memory access.
 Be equivalent to destination and source
 For example, AD R1, R2
 R1 is both a sink and a source
‘sink’ and ‘source’
source
sink
value
1.Reg-to-reg arithmetic AD R1, R2
FLOS
Adder
Mul
div
FLB
SDB
FLR Decoder
Storage
2.Storage-to-reg arithmetic AD R1, FLB
FLOS
Mul
divSDB
Decoder
Storage
Adder
FLR
FLB
3.Load LD R1, FLB1
FLOS
Adder
Mul
div
FLB
SDB
FLR Decoder
Storage
0
4.Store STD R1, SDB1
FLOS
Mul
div
FLB
Decoder
Storage
FLR
AdderSDB
0
Timing Sequence: 1. reg-to reg arithmetic
DecodeIU
EU Execute
Write back
to FLR
2 operands
To ALU
Decode
2. storage-to-reg arithmetic
DecodeIU
EU Execute
Write back
to FLR
FLR
To ALU
Decode
FLB
To ALU
Addr
Gen
Mem
Read
3.Load
DecodeIU
EU Execute
Writeback
to FLR
FLR
To ALU
Decode
FLB
To ALU
Addr
Gen
Mem
Read
4.Store
DecodeIU
EU Execute
FLR
To ALU
Decode
Write
To SDB
Addr
Gen
Mem
Write
A Day in the Life of ‘LD R1, addr’
FLOS
Adder
Mul
div
FLB
SDB
FLR Decoder
Storage
Instruction
Unit
FLBStorage FLOS
Adder
Mul
divSDB
Decoder
FLB1
addr
FLR
Decode &
Address
generation
A Day in the Life of ‘LD R1, addr’
Instruction
Unit
FLBStorage
A Day in the Life of ‘LD R1, addr’
FLOS
Adder
Mul
divSDB
Decoder
addr
FLB1
LD R1, FLB1
FLR
Instruction
Unit
FLBStorage
A Day in the Life of ‘LD R1, addr’
FLOS
Adder
Mul
divSDB
Decoder
addr
FLB1
LD R1, FLB1
FLR
FLBStorage
A Day in the Life of ‘LD R1, addr’
FLOS
Mul
divSDB
Decoder
addr
FLB1
LD R1, FLB1
OP
FLR
Adder
FLBStorage
A Day in the Life of ‘LD R1, addr’
FLOS
Mul
divSDB
addr
FLB1
LD R1, FLB1
OP
DecoderFLR
Adder
FLBStorage
A Day in the Life of ‘LD R1, addr’
FLOS
Adder
Mul
divSDB
FLR
addr
FLB1
R1
LD R1, FLB1
Decoder
An Example of Dependence
LD F0, FLB1
MD F0, FLB2
What if send them to different execution units at the
same time?
Adder
Mul
div
to exploit parallelisim
An Example of Dependence
LD F0, FLB1
MD F0, FLB2
The result(F0) cannot reflect the impact of LD, because
MD uses the old value of F0
Adder
Mul
div
An Example of Dependence
LD F0, FLB1
MD F0, FLB2
Adder
Mul
div
It is also called true dependence,
a.k.a. RAW
A Simple Solution
 ‘busy’ bit scheme
R0
R1
R2
R3
B
I’am already the sink
of some instruction
I need your
contentLD R1 B
MD R1 A
Performance Degrades...
 When the code keep using one register
 E.g. MD F0, E
AD F2, F0
AD F4, A
AD F2, F4
overlap fails because the first AD depends on MD,
though the others don’t
The second AD is qualified to
issue
Cause of the Problem
 If one instruction gets stuck(due to dependence), the
following can’t be decoded(even it is qualified to
issue)
Solution :
 Decouple the dependence mantainance from
decoding
 Look ahead more instructions for concurrency
Dispatch and Issue Decoupling
MD F0, E
AD F2, F0
AD F4, A
AD F2, F4
Adder
Can issue?Decode
Is that reg busy?
Dispatch and Issue Decoupling
MD F0, E
AD F2, F0
AD F4, A
AD F2, F4
Adder
Dispatch
anyway
Decode
Are my operands
ready?
MD F0, E Can issue?
An Example of True Dependence
LD F0, FLB1 F0 as sink
AD F2, F0 F0 as source
Adder
Mul
div
FLB
FLR
FLB1
F0
Assume CDB has not
been introduced yet
LD F0, FLB1 dispatches to A1
AD F2, F0
Adder
Mul
div
FLB
FLR
FLB1
F0
LD F0, FLB1
B A1
An Example of True Dependence
F0 is reserved for some
instruction
LD F0, FLB1 dispatches to A1
AD F2, F0
Adder
Mul
div
FLB
FLR
FLB1
F0
LD F0, FLB1
B A1
An Example of True Dependence
Its content is calculated
by A1
LD F0, FLB1
AD F2, F0
Adder
Mul
div
FLB
FLR
FLB1
F0
LD F0, FLB1
B A1
I need the value of F0,
but he seems to be busy
An Example of True Dependence
LD F0, FLB1
AD F2, F0 dispatches to A2
Adder
Mul
div
FLB
FLR
FLB1
F0
LD F0, FLB1
B A1
Since A1 is the
producer, just let
him tell me
An Example of True Dependence
AD F2, F0
LD F0, FLB1
AD F2, F0 dispatches to A2
Adder
Mul
div
FLB
FLR
FLB1
F0
LD F0, FLB1
B A1
Since A1 is the
producer, just ask
him for it
An Example of True Dependence
AD F2, A1
LD F0, FLB1 executing
AD F2, F0
Adder
Mul
div
FLB
FLR
FLB1
F0
LD F0, FLB1
B A1
An Example of True Dependence
AD F2, A1
Operands are ready.
Execute!
LD F0, FLB1 broadcasts it’s result to the air
AD F2, F0
Adder
Mul
div
FLB
FLR
FLB1
F0
LD F0, FLB1
B A1
I’m A1. Who needs
my result? Over..
An Example of True Dependence
AD F2, A1
LD F0, FLB1 broadcasts it’s result to the air
AD F2, F0
Adder
Mul
div
FLB
FLR
FLB1
F0
LD F0, FLB1
B A1
I depend on
A1!
An Example of True Dependence
AD F2, A1
Me too!
The Role of CDB
 Common Data Bus is in charge of value forwarding
 In reg-to-reg model, a value is passed through a
register(write & read)
F0
Write as sink
(Producer)
The Role of CDB
 Common Data Bus is in charge of value forwarding
 In reg-to-reg model, a value is passed through a
register(write & read)
F0
Read as source
(Consumer)
The Role of CDB
Add
For Mul
Resv. S
For
Resv. S
FLB
SDB
FLR
 Load/Store doesn’t need to go through ALU
 The dependence management is decoupled from
execution as expected
The Role of CDB
CDB
All units which
may take register
as an operand
All units which can
alter a register
ConsumerProducer
Add
For Mul
Resv. S
For
Resv. S
FLB
SDB
FLR
P:3
P:2
P:6
The Role of CDB
CDB
All units which
may take register
as an operand
All units which can
alter a register
ConsumerProducer
Add
For Mul
Resv. S
For
Resv. S
FLB
SDB
FLRC:4
C:3 C:2*2
C:3*2
The Implementation of CDB
 A consumer recognizes his producer by tagging
 Producers throw <tag, value> on the bus by
turns(make a request first)
 If tag matches , consumer ingates the value
C C C C C C
P P P P P P
tag tag tag X Y Y
Requset
(2 cycles)
The Implementation of CDB
 A consumer recognizes his producer by tagging
 Producers throw <tag, value> on the bus by
turns(make a request first)
 If tag matches , consumer ingates the value
P P P P P P
Y value
C C C C C C
tag tag tag X Y Y
The Implementation of CDB
 A consumer recognizes his producer by tagging
 Producers throw <tag, value> on the bus by
turns(make a request first)
 If tag matches , consumer ingates the value
PP P P P P
C C C C C C
tag tag tag X Y Y
request
The Implementation of CDB
 A consumer recognizes his producer by tagging
 Producers throw <tag, value> on the bus by
turns(make a request first)
 If tag matches , consumer ingates the value
PP P P P P
C C C C C C
tag tag tag X Y Y
X value
The Principle behind the Scene
 Tag is a pointer pointing to the producer of the value
required by the current instruction
 The pointers construct the dependency information
which are hidden by the reg-reg model(discuss later)
 With the information, the order of execution can be
resolved
 CDB enables ‘producer-consumer’ style data flow
LD F0, FLB1
AD F2, F0
LD F0, FLB2
AD F3, F0
Adder
Mul
div
FLB
FLR
F0
An Example for False Dependence
FLB2
FLB1
WAW
WAR
LD F0, FLB1 dispatches
AD F2, F0
LD F0, FLB2
AD F3, F0
Adder
Mul
div
FLB
FLR
F0
An Example for False Dependence
FLB2
FLB1
B FLB1
LD F0, FLB1
AD F2, F0 dispatches to A1
LD F0, FLB2
AD F3, F0
Adder
Mul
div
FLB
FLR
F0
AD F2, F0
An Example for False Dependence
FLB2
FLB1
B FLB1
LD F0, FLB1
AD F2, F0
LD F0, FLB2
AD F3, F0
Adder
Mul
div
FLB
FLR
F0
AD F2, F0
An Example for False Dependence
FLB2
FLB1
B FLB1
LD F0, FLB1
AD F2, F0
LD F0, FLB2 dispatches
AD F3, F0
Adder
Mul
div
FLB
FLR
F0
AD F2, F0
An Example for False Dependence
FLB2
FLB1
B FLB2
LD F0, FLB1
AD F2, F0
LD F0, FLB2
AD F3, F0 dispatches to A2
Adder
Mul
div
FLB
FLR
F0
AD F3, F0
AD F2, F0
An Example for False Dependence
FLB2
FLB1
B FLB2
LD F0, FLB1
AD F2, F0
LD F0, FLB2
AD F3, F0
Adder
Mul
div
FLB
FLR
F0
AD F3, F0
AD F2, F0
An Example for False Dependence
FLB2
FLB1
B FLB2
Keep tracing the source of
the value instead of the
register holding it
LD F0, FLB1
AD F2, F0
LD F0, FLB2
AD F3, F0
Adder
Mul
div
FLB
FLR
F0
AD F3, F0
AD F2, F0
An Example for False Dependence
FLB2
FLB1
B FLB2
There’s no need to rename
a register(Naming is just a
way of referring values)
Timing Sequence with Busy Bit
D
T EX WB
AG
D
FLB
D
T T EX WBD
D
T EX WB
AG
D
FLB
D
LD F0, FLB1
AD F2, F0
LD F0, FLB2
AD F3, F0
T T EX WBD
Timing Sequence with Reservation Station
D
T EX WB
AG
D
FLB
D
T T EX WBD
D
T EX WB
AG
D
FLB
D
T T EX WBD
LD F0, FLB1
AD F2, F0
LD F0, FLB2
AD F3, F0
The Side Effect of Register Machine
 What are the differences between a circuit and a
register machine?
The Side Effect of Register Machine
 What are the differences between a circuit and a
register machine?
Register Machine
 General purpose
 Control-driven
 Implict dependence via
registers
Circuit
 Special purpose
 Data-driven
 Exposed dependence
...But registers are rare
Conclusion
 Tomasulo algorithm has nothing to do with register
renaming
 It resolves the WAR & WAW by elimating the side
effect of using register to pass value
 By using Tomasulo algorithm, the execution of a
program is driven by data flow thus exploiting
maximum concurrency

Contenu connexe

Tendances

Fairness in Machine Learning and AI
Fairness in Machine Learning and AIFairness in Machine Learning and AI
Fairness in Machine Learning and AISeth Grimes
 
Iot presentation
Iot presentationIot presentation
Iot presentationhuma742446
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringViet-Trung TRAN
 
An Introduction to IoT: Connectivity & Case Studies
An Introduction to IoT: Connectivity & Case StudiesAn Introduction to IoT: Connectivity & Case Studies
An Introduction to IoT: Connectivity & Case Studies3G4G
 
Generative AI at the edge.pdf
Generative AI at the edge.pdfGenerative AI at the edge.pdf
Generative AI at the edge.pdfQualcomm Research
 
Music Recommendation 2018
Music Recommendation 2018Music Recommendation 2018
Music Recommendation 2018Fabien Gouyon
 
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Krishnaram Kenthapadi
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Anoop Deoras
 
Internet of Things: state of the art
Internet of Things: state of the artInternet of Things: state of the art
Internet of Things: state of the artMario Kušek
 
Fedarated learning
Fedarated learningFedarated learning
Fedarated learningVaishakhKP1
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsJustin Basilico
 
Tutorial on Advances in Bias-aware Recommendation on the Web @ WSDM 2021
Tutorial on Advances in Bias-aware Recommendation on the Web @ WSDM 2021Tutorial on Advances in Bias-aware Recommendation on the Web @ WSDM 2021
Tutorial on Advances in Bias-aware Recommendation on the Web @ WSDM 2021Mirko Marras
 
Graph Neural Networks for Recommendations
Graph Neural Networks for RecommendationsGraph Neural Networks for Recommendations
Graph Neural Networks for RecommendationsWQ Fan
 
Embedded system for traffic light control
Embedded system for traffic light controlEmbedded system for traffic light control
Embedded system for traffic light controlMadhu Prasad
 
LLM Healthcare.pdf
LLM Healthcare.pdfLLM Healthcare.pdf
LLM Healthcare.pdfATPowr
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language ProcessingYunyao Li
 
Difference between Artificial Intelligence, Machine Learning, Deep Learning a...
Difference between Artificial Intelligence, Machine Learning, Deep Learning a...Difference between Artificial Intelligence, Machine Learning, Deep Learning a...
Difference between Artificial Intelligence, Machine Learning, Deep Learning a...Sanjay Srivastava
 

Tendances (20)

Explainable AI (XAI)
Explainable AI (XAI)Explainable AI (XAI)
Explainable AI (XAI)
 
Fairness in Machine Learning and AI
Fairness in Machine Learning and AIFairness in Machine Learning and AI
Fairness in Machine Learning and AI
 
Iot presentation
Iot presentationIot presentation
Iot presentation
 
AWS for IoT
AWS for IoTAWS for IoT
AWS for IoT
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
 
An Introduction to IoT: Connectivity & Case Studies
An Introduction to IoT: Connectivity & Case StudiesAn Introduction to IoT: Connectivity & Case Studies
An Introduction to IoT: Connectivity & Case Studies
 
Generative AI at the edge.pdf
Generative AI at the edge.pdfGenerative AI at the edge.pdf
Generative AI at the edge.pdf
 
Music Recommendation 2018
Music Recommendation 2018Music Recommendation 2018
Music Recommendation 2018
 
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
 
Internet of Things: state of the art
Internet of Things: state of the artInternet of Things: state of the art
Internet of Things: state of the art
 
Fedarated learning
Fedarated learningFedarated learning
Fedarated learning
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Tutorial on Advances in Bias-aware Recommendation on the Web @ WSDM 2021
Tutorial on Advances in Bias-aware Recommendation on the Web @ WSDM 2021Tutorial on Advances in Bias-aware Recommendation on the Web @ WSDM 2021
Tutorial on Advances in Bias-aware Recommendation on the Web @ WSDM 2021
 
Chatgpt ppt
Chatgpt  pptChatgpt  ppt
Chatgpt ppt
 
Graph Neural Networks for Recommendations
Graph Neural Networks for RecommendationsGraph Neural Networks for Recommendations
Graph Neural Networks for Recommendations
 
Embedded system for traffic light control
Embedded system for traffic light controlEmbedded system for traffic light control
Embedded system for traffic light control
 
LLM Healthcare.pdf
LLM Healthcare.pdfLLM Healthcare.pdf
LLM Healthcare.pdf
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
 
Difference between Artificial Intelligence, Machine Learning, Deep Learning a...
Difference between Artificial Intelligence, Machine Learning, Deep Learning a...Difference between Artificial Intelligence, Machine Learning, Deep Learning a...
Difference between Artificial Intelligence, Machine Learning, Deep Learning a...
 

Similaire à Understanding Tomasulo Algorithm

Chuẩn hóa CSDL
Chuẩn hóa CSDLChuẩn hóa CSDL
Chuẩn hóa CSDLphananhvu
 
Basic programming of 8085
Basic programming of 8085 Basic programming of 8085
Basic programming of 8085 vijaydeepakg
 
Instruction_Set.pdf
Instruction_Set.pdfInstruction_Set.pdf
Instruction_Set.pdfboukomra
 
80386 microprocessor system instruction
80386 microprocessor system instruction80386 microprocessor system instruction
80386 microprocessor system instructionUmesh Talware
 
Tomasulo Algorithm
Tomasulo AlgorithmTomasulo Algorithm
Tomasulo AlgorithmFarwa Ansari
 
Open-DO Update
Open-DO UpdateOpen-DO Update
Open-DO UpdateAdaCore
 
PIC Instructions.pptx
PIC Instructions.pptxPIC Instructions.pptx
PIC Instructions.pptxAltaafMulani
 
Federated SPARQL Query Processing With Replicated Fragment
Federated SPARQL Query Processing With Replicated FragmentFederated SPARQL Query Processing With Replicated Fragment
Federated SPARQL Query Processing With Replicated FragmentPascal Molli
 
Comparing IDL to C++ with IDL to C++11
Comparing IDL to C++ with IDL to C++11Comparing IDL to C++ with IDL to C++11
Comparing IDL to C++ with IDL to C++11Remedy IT
 
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...Hsien-Hsin Sean Lee, Ph.D.
 

Similaire à Understanding Tomasulo Algorithm (16)

Chuẩn hóa CSDL
Chuẩn hóa CSDLChuẩn hóa CSDL
Chuẩn hóa CSDL
 
Al2ed chapter18
Al2ed chapter18Al2ed chapter18
Al2ed chapter18
 
Instructions
InstructionsInstructions
Instructions
 
Basic programming of 8085
Basic programming of 8085 Basic programming of 8085
Basic programming of 8085
 
VSE/POWER, all the news since z/VSE 4.2
VSE/POWER, all the news since z/VSE 4.2VSE/POWER, all the news since z/VSE 4.2
VSE/POWER, all the news since z/VSE 4.2
 
Instruction set
Instruction setInstruction set
Instruction set
 
Instruction_Set.pdf
Instruction_Set.pdfInstruction_Set.pdf
Instruction_Set.pdf
 
80386 microprocessor system instruction
80386 microprocessor system instruction80386 microprocessor system instruction
80386 microprocessor system instruction
 
Tomasulo Algorithm
Tomasulo AlgorithmTomasulo Algorithm
Tomasulo Algorithm
 
BCNF
BCNFBCNF
BCNF
 
Open-DO Update
Open-DO UpdateOpen-DO Update
Open-DO Update
 
Assembler
AssemblerAssembler
Assembler
 
PIC Instructions.pptx
PIC Instructions.pptxPIC Instructions.pptx
PIC Instructions.pptx
 
Federated SPARQL Query Processing With Replicated Fragment
Federated SPARQL Query Processing With Replicated FragmentFederated SPARQL Query Processing With Replicated Fragment
Federated SPARQL Query Processing With Replicated Fragment
 
Comparing IDL to C++ with IDL to C++11
Comparing IDL to C++ with IDL to C++11Comparing IDL to C++ with IDL to C++11
Comparing IDL to C++ with IDL to C++11
 
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
 

Dernier

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Dernier (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Understanding Tomasulo Algorithm

  • 1. Understanding the Tomasulo Algorithm Yichao Cheng Jul 23, 2013
  • 2. Background  IBM System/360 Model 91  FPU’s add/mul/div takes 2/3/13 cycles  Can performance be improved through utilizing multiple execution units? Adder Mul div
  • 3. Major Contributions Proposed three innovative mechanisms:  Common data busing(CDB)  Register tagging scheme  Reservation station which permits:  Out-of-order execution of independent instructions  while preserving the essential precedences in the instruction stream
  • 4. Doubt  When people talk about Tomasolu algorithm, they talk about register renaming  However this word can’t be found in the original paper How could anyone invent a thing without noticing it?
  • 6. From a FPU’s perspective All instructions are ‘register-to-register’  Register-to-register arithmetic  Storage-to-register arithmetic  Load  Store Instruction Unit(outside FPU) is in charge of the address generation and memory access.
  • 7.  Be equivalent to destination and source  For example, AD R1, R2  R1 is both a sink and a source ‘sink’ and ‘source’ source sink value
  • 8. 1.Reg-to-reg arithmetic AD R1, R2 FLOS Adder Mul div FLB SDB FLR Decoder Storage
  • 9. 2.Storage-to-reg arithmetic AD R1, FLB FLOS Mul divSDB Decoder Storage Adder FLR FLB
  • 10. 3.Load LD R1, FLB1 FLOS Adder Mul div FLB SDB FLR Decoder Storage 0
  • 11. 4.Store STD R1, SDB1 FLOS Mul div FLB Decoder Storage FLR AdderSDB 0
  • 12. Timing Sequence: 1. reg-to reg arithmetic DecodeIU EU Execute Write back to FLR 2 operands To ALU Decode
  • 13. 2. storage-to-reg arithmetic DecodeIU EU Execute Write back to FLR FLR To ALU Decode FLB To ALU Addr Gen Mem Read
  • 14. 3.Load DecodeIU EU Execute Writeback to FLR FLR To ALU Decode FLB To ALU Addr Gen Mem Read
  • 16. A Day in the Life of ‘LD R1, addr’ FLOS Adder Mul div FLB SDB FLR Decoder Storage Instruction Unit
  • 17. FLBStorage FLOS Adder Mul divSDB Decoder FLB1 addr FLR Decode & Address generation A Day in the Life of ‘LD R1, addr’ Instruction Unit
  • 18. FLBStorage A Day in the Life of ‘LD R1, addr’ FLOS Adder Mul divSDB Decoder addr FLB1 LD R1, FLB1 FLR Instruction Unit
  • 19. FLBStorage A Day in the Life of ‘LD R1, addr’ FLOS Adder Mul divSDB Decoder addr FLB1 LD R1, FLB1 FLR
  • 20. FLBStorage A Day in the Life of ‘LD R1, addr’ FLOS Mul divSDB Decoder addr FLB1 LD R1, FLB1 OP FLR Adder
  • 21. FLBStorage A Day in the Life of ‘LD R1, addr’ FLOS Mul divSDB addr FLB1 LD R1, FLB1 OP DecoderFLR Adder
  • 22. FLBStorage A Day in the Life of ‘LD R1, addr’ FLOS Adder Mul divSDB FLR addr FLB1 R1 LD R1, FLB1 Decoder
  • 23. An Example of Dependence LD F0, FLB1 MD F0, FLB2 What if send them to different execution units at the same time? Adder Mul div to exploit parallelisim
  • 24. An Example of Dependence LD F0, FLB1 MD F0, FLB2 The result(F0) cannot reflect the impact of LD, because MD uses the old value of F0 Adder Mul div
  • 25. An Example of Dependence LD F0, FLB1 MD F0, FLB2 Adder Mul div It is also called true dependence, a.k.a. RAW
  • 26. A Simple Solution  ‘busy’ bit scheme R0 R1 R2 R3 B I’am already the sink of some instruction I need your contentLD R1 B MD R1 A
  • 27. Performance Degrades...  When the code keep using one register  E.g. MD F0, E AD F2, F0 AD F4, A AD F2, F4 overlap fails because the first AD depends on MD, though the others don’t The second AD is qualified to issue
  • 28. Cause of the Problem  If one instruction gets stuck(due to dependence), the following can’t be decoded(even it is qualified to issue) Solution :  Decouple the dependence mantainance from decoding  Look ahead more instructions for concurrency
  • 29. Dispatch and Issue Decoupling MD F0, E AD F2, F0 AD F4, A AD F2, F4 Adder Can issue?Decode Is that reg busy?
  • 30. Dispatch and Issue Decoupling MD F0, E AD F2, F0 AD F4, A AD F2, F4 Adder Dispatch anyway Decode Are my operands ready? MD F0, E Can issue?
  • 31. An Example of True Dependence LD F0, FLB1 F0 as sink AD F2, F0 F0 as source Adder Mul div FLB FLR FLB1 F0 Assume CDB has not been introduced yet
  • 32. LD F0, FLB1 dispatches to A1 AD F2, F0 Adder Mul div FLB FLR FLB1 F0 LD F0, FLB1 B A1 An Example of True Dependence F0 is reserved for some instruction
  • 33. LD F0, FLB1 dispatches to A1 AD F2, F0 Adder Mul div FLB FLR FLB1 F0 LD F0, FLB1 B A1 An Example of True Dependence Its content is calculated by A1
  • 34. LD F0, FLB1 AD F2, F0 Adder Mul div FLB FLR FLB1 F0 LD F0, FLB1 B A1 I need the value of F0, but he seems to be busy An Example of True Dependence
  • 35. LD F0, FLB1 AD F2, F0 dispatches to A2 Adder Mul div FLB FLR FLB1 F0 LD F0, FLB1 B A1 Since A1 is the producer, just let him tell me An Example of True Dependence AD F2, F0
  • 36. LD F0, FLB1 AD F2, F0 dispatches to A2 Adder Mul div FLB FLR FLB1 F0 LD F0, FLB1 B A1 Since A1 is the producer, just ask him for it An Example of True Dependence AD F2, A1
  • 37. LD F0, FLB1 executing AD F2, F0 Adder Mul div FLB FLR FLB1 F0 LD F0, FLB1 B A1 An Example of True Dependence AD F2, A1 Operands are ready. Execute!
  • 38. LD F0, FLB1 broadcasts it’s result to the air AD F2, F0 Adder Mul div FLB FLR FLB1 F0 LD F0, FLB1 B A1 I’m A1. Who needs my result? Over.. An Example of True Dependence AD F2, A1
  • 39. LD F0, FLB1 broadcasts it’s result to the air AD F2, F0 Adder Mul div FLB FLR FLB1 F0 LD F0, FLB1 B A1 I depend on A1! An Example of True Dependence AD F2, A1 Me too!
  • 40. The Role of CDB  Common Data Bus is in charge of value forwarding  In reg-to-reg model, a value is passed through a register(write & read) F0 Write as sink (Producer)
  • 41. The Role of CDB  Common Data Bus is in charge of value forwarding  In reg-to-reg model, a value is passed through a register(write & read) F0 Read as source (Consumer)
  • 42. The Role of CDB Add For Mul Resv. S For Resv. S FLB SDB FLR  Load/Store doesn’t need to go through ALU  The dependence management is decoupled from execution as expected
  • 43. The Role of CDB CDB All units which may take register as an operand All units which can alter a register ConsumerProducer Add For Mul Resv. S For Resv. S FLB SDB FLR P:3 P:2 P:6
  • 44. The Role of CDB CDB All units which may take register as an operand All units which can alter a register ConsumerProducer Add For Mul Resv. S For Resv. S FLB SDB FLRC:4 C:3 C:2*2 C:3*2
  • 45. The Implementation of CDB  A consumer recognizes his producer by tagging  Producers throw <tag, value> on the bus by turns(make a request first)  If tag matches , consumer ingates the value C C C C C C P P P P P P tag tag tag X Y Y Requset (2 cycles)
  • 46. The Implementation of CDB  A consumer recognizes his producer by tagging  Producers throw <tag, value> on the bus by turns(make a request first)  If tag matches , consumer ingates the value P P P P P P Y value C C C C C C tag tag tag X Y Y
  • 47. The Implementation of CDB  A consumer recognizes his producer by tagging  Producers throw <tag, value> on the bus by turns(make a request first)  If tag matches , consumer ingates the value PP P P P P C C C C C C tag tag tag X Y Y request
  • 48. The Implementation of CDB  A consumer recognizes his producer by tagging  Producers throw <tag, value> on the bus by turns(make a request first)  If tag matches , consumer ingates the value PP P P P P C C C C C C tag tag tag X Y Y X value
  • 49. The Principle behind the Scene  Tag is a pointer pointing to the producer of the value required by the current instruction  The pointers construct the dependency information which are hidden by the reg-reg model(discuss later)  With the information, the order of execution can be resolved  CDB enables ‘producer-consumer’ style data flow
  • 50. LD F0, FLB1 AD F2, F0 LD F0, FLB2 AD F3, F0 Adder Mul div FLB FLR F0 An Example for False Dependence FLB2 FLB1 WAW WAR
  • 51. LD F0, FLB1 dispatches AD F2, F0 LD F0, FLB2 AD F3, F0 Adder Mul div FLB FLR F0 An Example for False Dependence FLB2 FLB1 B FLB1
  • 52. LD F0, FLB1 AD F2, F0 dispatches to A1 LD F0, FLB2 AD F3, F0 Adder Mul div FLB FLR F0 AD F2, F0 An Example for False Dependence FLB2 FLB1 B FLB1
  • 53. LD F0, FLB1 AD F2, F0 LD F0, FLB2 AD F3, F0 Adder Mul div FLB FLR F0 AD F2, F0 An Example for False Dependence FLB2 FLB1 B FLB1
  • 54. LD F0, FLB1 AD F2, F0 LD F0, FLB2 dispatches AD F3, F0 Adder Mul div FLB FLR F0 AD F2, F0 An Example for False Dependence FLB2 FLB1 B FLB2
  • 55. LD F0, FLB1 AD F2, F0 LD F0, FLB2 AD F3, F0 dispatches to A2 Adder Mul div FLB FLR F0 AD F3, F0 AD F2, F0 An Example for False Dependence FLB2 FLB1 B FLB2
  • 56. LD F0, FLB1 AD F2, F0 LD F0, FLB2 AD F3, F0 Adder Mul div FLB FLR F0 AD F3, F0 AD F2, F0 An Example for False Dependence FLB2 FLB1 B FLB2 Keep tracing the source of the value instead of the register holding it
  • 57. LD F0, FLB1 AD F2, F0 LD F0, FLB2 AD F3, F0 Adder Mul div FLB FLR F0 AD F3, F0 AD F2, F0 An Example for False Dependence FLB2 FLB1 B FLB2 There’s no need to rename a register(Naming is just a way of referring values)
  • 58. Timing Sequence with Busy Bit D T EX WB AG D FLB D T T EX WBD D T EX WB AG D FLB D LD F0, FLB1 AD F2, F0 LD F0, FLB2 AD F3, F0 T T EX WBD
  • 59. Timing Sequence with Reservation Station D T EX WB AG D FLB D T T EX WBD D T EX WB AG D FLB D T T EX WBD LD F0, FLB1 AD F2, F0 LD F0, FLB2 AD F3, F0
  • 60. The Side Effect of Register Machine  What are the differences between a circuit and a register machine?
  • 61. The Side Effect of Register Machine  What are the differences between a circuit and a register machine? Register Machine  General purpose  Control-driven  Implict dependence via registers Circuit  Special purpose  Data-driven  Exposed dependence ...But registers are rare
  • 62. Conclusion  Tomasulo algorithm has nothing to do with register renaming  It resolves the WAR & WAW by elimating the side effect of using register to pass value  By using Tomasulo algorithm, the execution of a program is driven by data flow thus exploiting maximum concurrency