SlideShare une entreprise Scribd logo
1  sur  20
Télécharger pour lire hors ligne
Evaluating Computers:
Bigger, better, faster, more?
1
What do you want in a computer?
2
What do you want in a computer?
• Low latency -- one unit of work in minimum time
• 1/latency = responsiveness
• High throughput -- maximum work per time
• High bandwidth (BW)
• Low cost
• Low power -- minimum jules per time
• Low energy -- minimum jules per work
• Reliability -- Mean time to failure (MTTF)
• Derived metrics
• responsiveness/dollar
• BW/$
• BW/Watt
• Work/Jule
• Energy * latency -- Energy delay product
• MTTF/$
3
Latency
• This is the simplest kind of performance
• How long does it take the computer to perform
a task?
• The task at hand depends on the situation.
• Usually measured in seconds
• Also measured in clock cycles
• Caution: if you are comparing two different system, you
must ensure that the cycle times are the same.
4
Measuring Latency
• Stop watch!
• System calls
• gettimeofday()
• System.currentTimeMillis()
• Command line
• time <command>
5
Where latency matters
• Application responsiveness
• Any time a person is waiting.
• GUIs
• Games
• Internet services (from the users perspective)
• “Real-time” applications
• Tight constraints enforced by the real world
• Anti-lock braking systems
• Manufacturing control
• Multi-media applications
• The cost of poor latency
• If you are selling computer time, latency is money.
6
Latency and Performance
• By definition:
• Performance = 1/Latency
• If Performance(X) > Performance(Y), X is faster.
• If Perf(X)/Perf(Y) = S, X is S times faster thanY.
• Equivalently: Latency(Y)/Latency(X) = S
• When we need to talk about specifically about
other kinds of “performance” we must be more
specific.
7
The Performance Equation
• We would like to model how architecture impacts
performance (latency)
• This means we need to quantify performance in
terms of architectural parameters.
• Instructions -- this is the basic unit of work for a
processor
• Cycle time -- these two give us a notion of time.
• Cycles
• The first fundamental theorem of computer
architecture:
Latency = Instructions * Cycles/Instruction *
Seconds/Cycle
8
The Performance Equation
• The units work out! Remember your
dimensional analysis!
• Cycles/Instruction == CPI
• Seconds/Cycle == 1/hz
• Example:
• 1GHz clock
• 1 billion instructions
• CPI = 4
• What is the latency?
9
Latency = Instructions * Cycles/Instruction * Seconds/Cycle
Examples
• gcc runs in 100 sec on a 1 GHz machine
– How many cycles does it take?
• gcc runs in 75 sec on a 600 MHz machine
– How many cycles does it take?
100G cycles
45G cycles
Latency = Instructions * Cycles/Instruction * Seconds/Cycle
How can this be?
• Different Instruction count?
• Different ISAs ?
• Different compilers ?
• Different CPI?
• underlying machine implementation
• Microarchitecture
• Different cycle time?
• New process technology
• Microarchitecture
11
Latency = Instructions * Cycles/Instruction * Seconds/Cycle
Computing Average CPI
• Instruction execution time depends on instruction
time (we’ll get into why this is so later on)
• Integer +, -, <<, |, & -- 1 cycle
• Integer *, /, -- 5-10 cycles
• Floating point +, - -- 3-4 cycles
• Floating point *, /, sqrt() -- 10-30 cycles
• Loads/stores -- variable
• All theses values depend on the particular implementation,
not the ISA
• Total CPI depends on the workload’s Instruction mix
-- how many of each type of instruction executes
• What program is running?
• How was it compiled?
12
The Compiler’s Role
• Compilers affect CPI…
• Wise instruction selection
• “Strength reduction”: x*2n -> x << n
• Use registers to eliminate loads and stores
• More compact code -> less waiting for instructions
• …and instruction count
• Common sub-expression elimination
• Use registers to eliminate loads and stores
13
Stupid Compiler
int i, sum = 0;
for(i=0;i<10;i++)
sum += i;
sw 0($sp), $0 #sum = 0
sw 4($sp), $0 #i = 0
loop:
lw $1, 4($sp)
sub $3, $1, 10
beq $3, $0, end
lw $2, 0($sp)
add $2, $2, $1
st 0($sp), $2
addi $1, $1, 1
st 4($sp), $1
b loop
end:
Type CPI Static # dyn #
mem 5 6 42
int 1 3 30
br 1 2 20
Total 2.8 11 92
(5*42 + 1*30 + 1*20)/92 = 2.8
Smart Compiler
int i, sum = 0;
for(i=0;i<10;i++)
sum += i;
add $1, $0, $0 # i
add $2, $0, $0 # sum
loop:
sub $3, $1, 10
beq $3, $0, end
add $2, $2, $1
addi $1, $1, 1
b loop
end:
sw 0($sp), $2
Type CPI Static # dyn #
mem 5 1 1
int 1 5 32
br 1 2 20
Total 1.01 8 53
(5*1 + 1*32 + 1*20)/53 = 2.8
Live demo
16
Program inputs affect CPI too!
int rand[1000] = {random 0s and 1s }
for(i=0;i<1000;i++)
if(rand[i]) sum -= i;
else sum *= i;
int ones[1000] = {1, 1, ...}
for(i=0;i<1000;i++)
if(ones[i]) sum -= i;
else sum *= i;
• Data-dependent computation
• Data-dependent micro-architectural behavior
–Processors are faster when the computation is
predictable (more later)
Live demo
18
• Meaningful CPI exists only:
• For a particular program with a particular compiler
• ....with a particular input.
• You MUST consider all 3 to get accurate latency estimations
or machine speed comparisons
• Instruction Set
• Compiler
• Implementation of Instruction Set (386 vs Pentium)
• Processor Freq (600 Mhz vs 1 GHz)
• Same high level program with same input
• “wall clock” measurements are always comparable.
• If the workloads (app + inputs) are the same
19
Making Meaningful Comparisons
Latency = Instructions * Cycles/Instruction * Seconds/Cycle
The Performance Equation
• Clock rate =
• Instruction count =
• Latency =
• Find the CPI!
20
Latency = Instructions * Cycles/Instruction * Seconds/Cycle

Contenu connexe

Similaire à 02 performance

Cpu performance matrix
Cpu performance matrixCpu performance matrix
Cpu performance matrixRehman baig
 
Lec3 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Performance
Lec3 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- PerformanceLec3 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Performance
Lec3 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- PerformanceHsien-Hsin Sean Lee, Ph.D.
 
Computer architecture short note (version 8)
Computer architecture short note (version 8)Computer architecture short note (version 8)
Computer architecture short note (version 8)Nimmi Weeraddana
 
L07_performance and cost in advanced hardware- computer architecture.pptx
L07_performance and cost in advanced hardware- computer architecture.pptxL07_performance and cost in advanced hardware- computer architecture.pptx
L07_performance and cost in advanced hardware- computer architecture.pptxIsaac383415
 
What’s eating python performance
What’s eating python performanceWhat’s eating python performance
What’s eating python performancePiotr Przymus
 
Resource Management in (Embedded) Real-Time Systems
Resource Management in (Embedded) Real-Time SystemsResource Management in (Embedded) Real-Time Systems
Resource Management in (Embedded) Real-Time Systemsjeronimored
 
Class5_DataloggerProgrammingArduino.pptx
Class5_DataloggerProgrammingArduino.pptxClass5_DataloggerProgrammingArduino.pptx
Class5_DataloggerProgrammingArduino.pptxHebaEng
 
Operating Systems Process Scheduling Algorithms
Operating Systems   Process Scheduling AlgorithmsOperating Systems   Process Scheduling Algorithms
Operating Systems Process Scheduling Algorithmssathish sak
 
cs1311lecture25wdl.ppt
cs1311lecture25wdl.pptcs1311lecture25wdl.ppt
cs1311lecture25wdl.pptFannyBellows
 
Document 14 (6).pdf
Document 14 (6).pdfDocument 14 (6).pdf
Document 14 (6).pdfRajMantry
 
Parallel Computing - Lec 6
Parallel Computing - Lec 6Parallel Computing - Lec 6
Parallel Computing - Lec 6Shah Zaib
 
VTU 5TH SEM CSE OPERATING SYSTEMS SOLVED PAPERS
VTU 5TH SEM CSE OPERATING SYSTEMS SOLVED PAPERSVTU 5TH SEM CSE OPERATING SYSTEMS SOLVED PAPERS
VTU 5TH SEM CSE OPERATING SYSTEMS SOLVED PAPERSvtunotesbysree
 
Evaluation of computer performance
Evaluation of computer performanceEvaluation of computer performance
Evaluation of computer performancePrasenjit Dey
 
Let's write a Debugger!
Let's write a Debugger!Let's write a Debugger!
Let's write a Debugger!Levente Kurusa
 
SMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgiSMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgiTakuya ASADA
 
Kiến trúc máy tính-COE 301 - Performance.ppt
Kiến trúc máy tính-COE 301 - Performance.pptKiến trúc máy tính-COE 301 - Performance.ppt
Kiến trúc máy tính-COE 301 - Performance.pptTriTrang4
 
fggggggggggggggggggggggggggggggfffffffffffffffffff
fggggggggggggggggggggggggggggggffffffffffffffffffffggggggggggggggggggggggggggggggfffffffffffffffffff
fggggggggggggggggggggggggggggggfffffffffffffffffffadugnanegero
 

Similaire à 02 performance (20)

Cpu performance matrix
Cpu performance matrixCpu performance matrix
Cpu performance matrix
 
Lec3 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Performance
Lec3 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- PerformanceLec3 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Performance
Lec3 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Performance
 
1571 mean
1571 mean1571 mean
1571 mean
 
Computer architecture short note (version 8)
Computer architecture short note (version 8)Computer architecture short note (version 8)
Computer architecture short note (version 8)
 
L07_performance and cost in advanced hardware- computer architecture.pptx
L07_performance and cost in advanced hardware- computer architecture.pptxL07_performance and cost in advanced hardware- computer architecture.pptx
L07_performance and cost in advanced hardware- computer architecture.pptx
 
What’s eating python performance
What’s eating python performanceWhat’s eating python performance
What’s eating python performance
 
Resource Management in (Embedded) Real-Time Systems
Resource Management in (Embedded) Real-Time SystemsResource Management in (Embedded) Real-Time Systems
Resource Management in (Embedded) Real-Time Systems
 
Class5_DataloggerProgrammingArduino.pptx
Class5_DataloggerProgrammingArduino.pptxClass5_DataloggerProgrammingArduino.pptx
Class5_DataloggerProgrammingArduino.pptx
 
22CS201 COA
22CS201 COA22CS201 COA
22CS201 COA
 
Operating Systems Process Scheduling Algorithms
Operating Systems   Process Scheduling AlgorithmsOperating Systems   Process Scheduling Algorithms
Operating Systems Process Scheduling Algorithms
 
cs1311lecture25wdl.ppt
cs1311lecture25wdl.pptcs1311lecture25wdl.ppt
cs1311lecture25wdl.ppt
 
Document 14 (6).pdf
Document 14 (6).pdfDocument 14 (6).pdf
Document 14 (6).pdf
 
Parallel Computing - Lec 6
Parallel Computing - Lec 6Parallel Computing - Lec 6
Parallel Computing - Lec 6
 
VTU 5TH SEM CSE OPERATING SYSTEMS SOLVED PAPERS
VTU 5TH SEM CSE OPERATING SYSTEMS SOLVED PAPERSVTU 5TH SEM CSE OPERATING SYSTEMS SOLVED PAPERS
VTU 5TH SEM CSE OPERATING SYSTEMS SOLVED PAPERS
 
Evaluation of computer performance
Evaluation of computer performanceEvaluation of computer performance
Evaluation of computer performance
 
Let's write a Debugger!
Let's write a Debugger!Let's write a Debugger!
Let's write a Debugger!
 
SMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgiSMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgi
 
Kiến trúc máy tính-COE 301 - Performance.ppt
Kiến trúc máy tính-COE 301 - Performance.pptKiến trúc máy tính-COE 301 - Performance.ppt
Kiến trúc máy tính-COE 301 - Performance.ppt
 
3_process_scheduling.ppt
3_process_scheduling.ppt3_process_scheduling.ppt
3_process_scheduling.ppt
 
fggggggggggggggggggggggggggggggfffffffffffffffffff
fggggggggggggggggggggggggggggggffffffffffffffffffffggggggggggggggggggggggggggggggfffffffffffffffffff
fggggggggggggggggggggggggggggggfffffffffffffffffff
 

Plus de marangburu42

Hennchthree 161102111515
Hennchthree 161102111515Hennchthree 161102111515
Hennchthree 161102111515marangburu42
 
Sequential circuits
Sequential circuitsSequential circuits
Sequential circuitsmarangburu42
 
Combinational circuits
Combinational circuitsCombinational circuits
Combinational circuitsmarangburu42
 
Hennchthree 160912095304
Hennchthree 160912095304Hennchthree 160912095304
Hennchthree 160912095304marangburu42
 
Sequential circuits
Sequential circuitsSequential circuits
Sequential circuitsmarangburu42
 
Combinational circuits
Combinational circuitsCombinational circuits
Combinational circuitsmarangburu42
 
Karnaugh mapping allaboutcircuits
Karnaugh mapping allaboutcircuitsKarnaugh mapping allaboutcircuits
Karnaugh mapping allaboutcircuitsmarangburu42
 
Aac boolean formulae
Aac   boolean formulaeAac   boolean formulae
Aac boolean formulaemarangburu42
 
Virtualmemoryfinal 161019175858
Virtualmemoryfinal 161019175858Virtualmemoryfinal 161019175858
Virtualmemoryfinal 161019175858marangburu42
 
File system interfacefinal
File system interfacefinalFile system interfacefinal
File system interfacefinalmarangburu42
 
File systemimplementationfinal
File systemimplementationfinalFile systemimplementationfinal
File systemimplementationfinalmarangburu42
 
Mass storage structurefinal
Mass storage structurefinalMass storage structurefinal
Mass storage structurefinalmarangburu42
 
All aboutcircuits karnaugh maps
All aboutcircuits karnaugh mapsAll aboutcircuits karnaugh maps
All aboutcircuits karnaugh mapsmarangburu42
 
Virtual memoryfinal
Virtual memoryfinalVirtual memoryfinal
Virtual memoryfinalmarangburu42
 
Mainmemoryfinal 161019122029
Mainmemoryfinal 161019122029Mainmemoryfinal 161019122029
Mainmemoryfinal 161019122029marangburu42
 

Plus de marangburu42 (20)

Hol
HolHol
Hol
 
Write miss
Write missWrite miss
Write miss
 
Hennchthree 161102111515
Hennchthree 161102111515Hennchthree 161102111515
Hennchthree 161102111515
 
Hennchthree
HennchthreeHennchthree
Hennchthree
 
Hennchthree
HennchthreeHennchthree
Hennchthree
 
Sequential circuits
Sequential circuitsSequential circuits
Sequential circuits
 
Combinational circuits
Combinational circuitsCombinational circuits
Combinational circuits
 
Hennchthree 160912095304
Hennchthree 160912095304Hennchthree 160912095304
Hennchthree 160912095304
 
Sequential circuits
Sequential circuitsSequential circuits
Sequential circuits
 
Combinational circuits
Combinational circuitsCombinational circuits
Combinational circuits
 
Karnaugh mapping allaboutcircuits
Karnaugh mapping allaboutcircuitsKarnaugh mapping allaboutcircuits
Karnaugh mapping allaboutcircuits
 
Aac boolean formulae
Aac   boolean formulaeAac   boolean formulae
Aac boolean formulae
 
Virtualmemoryfinal 161019175858
Virtualmemoryfinal 161019175858Virtualmemoryfinal 161019175858
Virtualmemoryfinal 161019175858
 
Io systems final
Io systems finalIo systems final
Io systems final
 
File system interfacefinal
File system interfacefinalFile system interfacefinal
File system interfacefinal
 
File systemimplementationfinal
File systemimplementationfinalFile systemimplementationfinal
File systemimplementationfinal
 
Mass storage structurefinal
Mass storage structurefinalMass storage structurefinal
Mass storage structurefinal
 
All aboutcircuits karnaugh maps
All aboutcircuits karnaugh mapsAll aboutcircuits karnaugh maps
All aboutcircuits karnaugh maps
 
Virtual memoryfinal
Virtual memoryfinalVirtual memoryfinal
Virtual memoryfinal
 
Mainmemoryfinal 161019122029
Mainmemoryfinal 161019122029Mainmemoryfinal 161019122029
Mainmemoryfinal 161019122029
 

Dernier

The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 

Dernier (20)

The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 

02 performance

  • 2. What do you want in a computer? 2
  • 3. What do you want in a computer? • Low latency -- one unit of work in minimum time • 1/latency = responsiveness • High throughput -- maximum work per time • High bandwidth (BW) • Low cost • Low power -- minimum jules per time • Low energy -- minimum jules per work • Reliability -- Mean time to failure (MTTF) • Derived metrics • responsiveness/dollar • BW/$ • BW/Watt • Work/Jule • Energy * latency -- Energy delay product • MTTF/$ 3
  • 4. Latency • This is the simplest kind of performance • How long does it take the computer to perform a task? • The task at hand depends on the situation. • Usually measured in seconds • Also measured in clock cycles • Caution: if you are comparing two different system, you must ensure that the cycle times are the same. 4
  • 5. Measuring Latency • Stop watch! • System calls • gettimeofday() • System.currentTimeMillis() • Command line • time <command> 5
  • 6. Where latency matters • Application responsiveness • Any time a person is waiting. • GUIs • Games • Internet services (from the users perspective) • “Real-time” applications • Tight constraints enforced by the real world • Anti-lock braking systems • Manufacturing control • Multi-media applications • The cost of poor latency • If you are selling computer time, latency is money. 6
  • 7. Latency and Performance • By definition: • Performance = 1/Latency • If Performance(X) > Performance(Y), X is faster. • If Perf(X)/Perf(Y) = S, X is S times faster thanY. • Equivalently: Latency(Y)/Latency(X) = S • When we need to talk about specifically about other kinds of “performance” we must be more specific. 7
  • 8. The Performance Equation • We would like to model how architecture impacts performance (latency) • This means we need to quantify performance in terms of architectural parameters. • Instructions -- this is the basic unit of work for a processor • Cycle time -- these two give us a notion of time. • Cycles • The first fundamental theorem of computer architecture: Latency = Instructions * Cycles/Instruction * Seconds/Cycle 8
  • 9. The Performance Equation • The units work out! Remember your dimensional analysis! • Cycles/Instruction == CPI • Seconds/Cycle == 1/hz • Example: • 1GHz clock • 1 billion instructions • CPI = 4 • What is the latency? 9 Latency = Instructions * Cycles/Instruction * Seconds/Cycle
  • 10. Examples • gcc runs in 100 sec on a 1 GHz machine – How many cycles does it take? • gcc runs in 75 sec on a 600 MHz machine – How many cycles does it take? 100G cycles 45G cycles Latency = Instructions * Cycles/Instruction * Seconds/Cycle
  • 11. How can this be? • Different Instruction count? • Different ISAs ? • Different compilers ? • Different CPI? • underlying machine implementation • Microarchitecture • Different cycle time? • New process technology • Microarchitecture 11 Latency = Instructions * Cycles/Instruction * Seconds/Cycle
  • 12. Computing Average CPI • Instruction execution time depends on instruction time (we’ll get into why this is so later on) • Integer +, -, <<, |, & -- 1 cycle • Integer *, /, -- 5-10 cycles • Floating point +, - -- 3-4 cycles • Floating point *, /, sqrt() -- 10-30 cycles • Loads/stores -- variable • All theses values depend on the particular implementation, not the ISA • Total CPI depends on the workload’s Instruction mix -- how many of each type of instruction executes • What program is running? • How was it compiled? 12
  • 13. The Compiler’s Role • Compilers affect CPI… • Wise instruction selection • “Strength reduction”: x*2n -> x << n • Use registers to eliminate loads and stores • More compact code -> less waiting for instructions • …and instruction count • Common sub-expression elimination • Use registers to eliminate loads and stores 13
  • 14. Stupid Compiler int i, sum = 0; for(i=0;i<10;i++) sum += i; sw 0($sp), $0 #sum = 0 sw 4($sp), $0 #i = 0 loop: lw $1, 4($sp) sub $3, $1, 10 beq $3, $0, end lw $2, 0($sp) add $2, $2, $1 st 0($sp), $2 addi $1, $1, 1 st 4($sp), $1 b loop end: Type CPI Static # dyn # mem 5 6 42 int 1 3 30 br 1 2 20 Total 2.8 11 92 (5*42 + 1*30 + 1*20)/92 = 2.8
  • 15. Smart Compiler int i, sum = 0; for(i=0;i<10;i++) sum += i; add $1, $0, $0 # i add $2, $0, $0 # sum loop: sub $3, $1, 10 beq $3, $0, end add $2, $2, $1 addi $1, $1, 1 b loop end: sw 0($sp), $2 Type CPI Static # dyn # mem 5 1 1 int 1 5 32 br 1 2 20 Total 1.01 8 53 (5*1 + 1*32 + 1*20)/53 = 2.8
  • 17. Program inputs affect CPI too! int rand[1000] = {random 0s and 1s } for(i=0;i<1000;i++) if(rand[i]) sum -= i; else sum *= i; int ones[1000] = {1, 1, ...} for(i=0;i<1000;i++) if(ones[i]) sum -= i; else sum *= i; • Data-dependent computation • Data-dependent micro-architectural behavior –Processors are faster when the computation is predictable (more later)
  • 19. • Meaningful CPI exists only: • For a particular program with a particular compiler • ....with a particular input. • You MUST consider all 3 to get accurate latency estimations or machine speed comparisons • Instruction Set • Compiler • Implementation of Instruction Set (386 vs Pentium) • Processor Freq (600 Mhz vs 1 GHz) • Same high level program with same input • “wall clock” measurements are always comparable. • If the workloads (app + inputs) are the same 19 Making Meaningful Comparisons Latency = Instructions * Cycles/Instruction * Seconds/Cycle
  • 20. The Performance Equation • Clock rate = • Instruction count = • Latency = • Find the CPI! 20 Latency = Instructions * Cycles/Instruction * Seconds/Cycle