Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
FUNDAMENTALS OF COMPUTER DESIGN
1. EC6009-Advanced Computer Architecture
UNIT I FUNDAMENTALS OF COMPUTER DESIGN 9
Review of Fundamentals of CPU, Memory and IO – Trends in technology, power, energy
and cost, Dependability - Performance Evaluation
UNIT II INSTRUCTION LEVEL PARALLELISM 9
ILP concepts – Pipelining overview - Compiler Techniques for Exposing ILP – Dynamic
Branch Prediction – Dynamic Scheduling – Multiple instruction Issue – Hardware Based
Speculation – Static scheduling - Multi-threading - Limitations of ILP – Case Studies.
UNIT III DATA-LEVEL PARALLELISM 9
Vector architecture – SIMD extensions – Graphics Processing units – Loop level
parallelism.
UNIT IV THREAD LEVEL PARALLELISM 9
Symmetric and Distributed Shared Memory Architectures – Performance Issues –
Synchronization – Models of Memory Consistency – Case studies: Intel i7 Processor, SMT
& CMP Processors
UNIT V MEMORY AND I/O 9
Cache Performance – Reducing Cache Miss Penalty and Miss Rate – Reducing Hit Time –
Main Memory and Performance – Memory Technology. Types of Storage Devices – Buses
– RAID – Reliability, Availability and Dependability – I/O Performance Measures.
7/3/2019 VII-ECE-B
2. On completion of the course, the students will be able to
CO1
Explain the performance of different architectures with
respect to various parameters
K2
CO2 Describe the performance of different ILP techniques K2
CO3
Discuss the performance of different architectures &
exploiting DLP
K2
CO4 Illustrate the concepts of Transport level protocol. K2
CO5
Distinguish cache and memory related issues in
multiprocessor.
K2
EC6009-Advanced Computer Architecture
CO/P
O
PO
1
PO
2
PO
3
PO
4
PO
5
PO
6
PO
7
PO
8
PO
9
PO1
0
PO1
1
PO1
2
PSO
1
PSO
2
CO1 3 2 1
CO2 3 2 1
CO3 3 2 1
CO4 3 2 1
CO5 3 2 2
C404 3 2 1 - - - - - - - -- - - -7/3/2019 VII-ECE-B
3. • PO 1 - Engineering Knowledge
• PO 2 - Problem analysis
• PO 3 - Design / development of solutions
• PO 4 - Conduct investigations of complex problems
• PO 5 - Modern tool usage:
• PO 6 - Engineer and Society:
• PO 7 - Environment and sustainability:
• PO 8 - Ethics:
• PO 9 - Individual and Team-work:
• PO 10 - Communication:
• PO 11 - Project management and finance:
• PO 12 - Life-long learning:
7/3/2019 VII-ECE-B
8. Current Trends in Architecture
• Cannot continue to leverage Instruction-Level parallelism (ILP)
– Single processor performance improvement ended in 2003
• New models for performance:
– Data-level parallelism (DLP)
– Thread-level parallelism (TLP)
– Request-level parallelism (RLP)
• These require explicit restructuring of the application
Introduction
7/3/2019 VII-ECE-B
9. Organization, Hardware, and Architecture
• Organization: includes the high-level aspects of a computer’s
design.
– Memory system, the memory interconnect, and the design of
the internal processor or CPU (arithmetic, logic, branching, and
data transfer).
– For example: AMD Opteron 64 and Intel P4 have same ISA,
but they have different internal pipeline and cache
organizations.
• Hardware: detailed logic design and the packaging technology.
– For example, P4 and Mobile P4 have same ISA and
organization, but they have different clock frequency and
memory system.
• Architecture: covers all three aspects of computer design –
instruction set architecture, organization, and hardware.
– Designer must meet functional requirements as well as price,
power, performance, and availability goals.7/3/2019 VII-ECE-B
10. Instruction Set Architecture: Critical Interface
• Properties of a good abstraction
– Lasts through many generations (portability)
– Used in many different ways (generality)
– Provides convenient functionality to higher levels
– Permits an efficient implementation at lower levels
instruction set
software
hardware
7/3/2019 VII-ECE-B
11. Instruction Set Architecture (ISA)
– Class of ISA:
ISA is the actual programmer-visible instruction set.
– General purpose Architecture( Reg Memory, Load-Store )
– Stack Architecture
– Memory addressing;
(if Program running 32-bit processor can address upto
4GB (2*32bytes) of address space)
– Addressing modes;
(Direct & Indirect) apart etc…
– Types and sizes of operands:
The common type Supported by ISA, includes, signed ,
unsigned, single & double precision Floating point
numbers)
– Data processing & Control flow instructions;
7/3/2019 VII-ECE-B
12. Classes of Computers
• Personal Mobile Device (PMD)
– e.g. start phones, tablet computers
– Emphasis on energy efficiency and real-time
• Desktop Computing (Work stations)
– Emphasis on price-performance
• Servers (Main frame)
– Emphasis on availability, scalability, throughput
• Clusters / Warehouse Scale Computers
– Used for “Software as a Service (SaaS)”
– Emphasis on availability and price-performance
– Sub-class: Supercomputers, emphasis: floating-point
performance and fast internal networks
• Embedded Computers
– Emphasis: price
ClassesofComputers
7/3/2019 VII-ECE-B
13. Trends in Technology
• A successful new ISA may last decades, for example, IBM
mainframe.
• Four critical technologies
– Integrated circuit logic technology: transistor density
increased by about 35% per year, quadrupling in
somewhat over four years;
– Semiconductor DRAM (Dynamic Random-Access
Memory): capacity increases by about 40% per year,
doubling roughly every two years;
– Magnetic disk technology: roller coaster of rates, disk
are 50-100 times cheaper per bit than DRAM .
– Network technology: network performance depends
both on the performance of switches and transmission.
7/3/2019 VII-ECE-B
14. Scaling of Transistor Performance and Wires
• Feature size: the minimum size of a transistor or a wire in
either the x or y dimension.
– From 10 microns in 1971 to 0.09 microns (90 nm) in 2006;
– The density of transistors increases quadratically with a
linear decrease in feature size;
– Transistor performance improves linearly with decreasing
feature size;
– Since improvement in transistor density, thus CPU move
quickly from 4-bit to 8-bit, to 16-bit, to 32-bit
microprocessors;
7/3/2019 VII-ECE-B
15. Performance Trends: Bandwidth over Latency
• Bandwidth or
throughput:
• the total amount of
work done in a given
time.
– Such as megabyte per
second for a disk
transfer.
• Latency or response
time: the time between
the start and the
completion of an event.
– Such as milliseconds
for a disk access.
7/3/2019 VII-ECE-B
16. Power
• Power also provides challenges as devices are
scaled.
– Dynamic power (watts, W)in CMOS chip: the
traditional dominant energy consumption has been
in switching transistors.
– For mobile devices: they care about battery life more
than power, so energy is the proper metric,
measured in joules:
switchedFrequencyVoltageloadCapacitive
2
1
Power 2
dynamic
† In modern VLSI, the exact power measurement is the sum of,
Powertotal=Powerdynamic+Powerstatic+Powerleakage
2
dynamic VoltageloadCapacitiveEnergy
7/3/2019 VII-ECE-B
17. Power
• Static power: an important issue because leakage
current flows even when a transistor is off:
– Thus, transistor ↑, power ↑;
– Feature size ↓, power ↑ (why? You can find out in
VLSI area).
VoltageCurrentPower staticstatic
7/3/2019 VII-ECE-B
18. Silicon Wafer and Dies
• Exponential cost decrease – technology
basically the same:
A wafer is tested and chopped into dies that are
packaged. Die (晶粒)
Wafer (晶圓)
AMD K8, source: http://www.amd.com
dies along the edge
7/3/2019 VII-ECE-B
19. Cost of an Integrated Circuit (IC)
yieldDiewaferperDies#
waferofCost
dieofCost
yieldtestFinal
testfinalandpackagingofCostdietestingofCostdieofCost
ICofCost
areaDie2
diameterWaferπ
areaDie
radiusWaferπ
waferperDies#
2
α
α
areaDiedesityDefect
1yieldWaferyieldDie
Today’s technology: 4.0, defect density 0.4 ~ 0.8 per cm2
(A greater portion of the cost that varies between
machines)
(sensitive to die size) (# of dies along the edge)
7/3/2019 VII-ECE-B
20. Response Time, Throughput, and Performance
• Response time : the time between the start and the
completion of an event – also referred to as execution
time.
– The computer user is interested.
• Throughput : the total amount of work done in a given
time.
– The administrator of a large data processing center may be
interested.
• In comparing design alternatives,
– The phrase “X is faster than Y” is used here to mean that the
response time or execution time is lower on X than on Y.
– In particular, “X is n times faster than Y” or “the throughput of
X is n times higher than Y” will mean
n
X
Y
timeExecution
timeExecution
7/3/2019 VII-ECE-B
21. Performance Measuring
• Execution is the reciprocal of performance,
X
X
timeExecution
1
ePerformanc
Y
X
X
Y
X
Y
ePerformanc
ePerformanc
ePerformanc
1
ePerformanc
1
TimeExecution
TimeExecution
n
7/3/2019 VII-ECE-B
22. Reliable Measure – User CPU Time
• Response time may include disk access, memory access,
input/output activities, CPU event and operating system
overhead – everything…
• In order to get an accurate measure of performance, we use
CPU time instead of using response time.
• CPU time is the time the CPU spends computing a program and
does not include time spent waiting for I/O or running other
programs.
• CPU time can also be divided into user CPU time (program) and
system CPU time (OS).
• Key in UNIX command time, we have, 90.7s 12.9s 2:39 65%
(user CPU, system CPU, total response,%).
• In our performance measures, we use user CPU time – because
of its independence on the OS and other factors.
7/3/2019 VII-ECE-B
23. CPU Performance
• Essentially all computers are constructed using clock
(all called ticks, clock ticks, clock periods, clocks,
cycles, or clock cycles) running at a constant rate.
– Clock rate: today in GHz
– Clock cycle time: clock cycle time = 1/clock rate
– Ex. 1 GHz clock rate = 1 ns cycle time
• Thus, the CPU time for a program can be expressed
two ways:
Or,
timecycleClockprogramaforcyclesclockCPUTimeCPU
rateClock
programaforcyclesclockCPU
TimeCPU
7/3/2019 VII-ECE-B
24. CPU Performance
• We can also count the number of instructions executed – the
instruction path length or instruction count (IC).
• If we know the number of clock cycles and IC, then the average
number of clock cycles per instruction (CPI).
• CPI is computed as
• Thus, clock cycles can be defined as IC × CPI, this allows us to use CPI
in the execution time formula:
IC
programaforcyclesclockCPU
CPI
† This figure provides insight into different styles of instruction sets and
implementations.
rateClock
CPIIC
timecycleClockCPIICtimeCPU
7/3/2019 VII-ECE-B
26. CPU Performance
• The pieces fit together of CPU time
• A α% improvement in any one of three pieces leads to a α% improvement
in CPU time.
– Unfortunately, it is difficult to change one parameter in complete isolation
form others, because the technologies of them are interdependent:
• Clock cycle time: Hardware technology and organization;
• CPI: Organization and instruction set architecture;
• Instruction count: Instruction set architecture and compiler technology.
timeCPU
program
Seconds
cycleClock
Seconds
nInstructio
cyclesClock
Program
nsInstructio
program
timecyclecyclesclock
timecycleClockprogramaforcyclesclockCPUTimeCPU
† Processor performance is dependent upon three characteristics:
instruction count, clock cycles per instruction and clock cycle (or rate).
† Computer architecture is focus on CPI and IC parameters.
7/3/2019 VII-ECE-B
27. CPU Performance
• To calculate the number of total processor clock cycles as
• To express CPU time again
– And overall CPI as
i
n
i
i CPIICcyclesclockCPU
1
ICi: the number of times instruction i is executed in a program.
CPIi: the average number of clocks per instruction for instruction i.
† ICi/IC presents the fraction of occurrences of that instruction in a program.
† It is useful in designing the processor.
timecycleClockCPIICtimeCPU
1
i
n
i
i
n
i
i
i
i
n
i
i
1
1
CPI
countnInstructio
IC
countnInstructio
CPIIC
CPI
Hint: CPIi should be measured
because pipeline effects, cache
misses, and any other memory
system inefficiencies.
7/3/2019 VII-ECE-B