Memory systems n

The Memory Hierarchy
TopicsTopics
 Storage technologies and trends
 Locality of reference
 Caching in the memory hierarchy
Memory Systems
-1-

Main Memory
• Main memory provides the main storage for a computer.Main memory provides the main storage for a computer.
• Two CPU registers are used to interface the CPU to theTwo CPU registers are used to interface the CPU to the
main memory. These aremain memory. These are
the memory address register (MA R) andthe memory address register (MA R) and
the memory data register (MDR).the memory data register (MDR).
• The MDR is used to hold the data to be stored and / orThe MDR is used to hold the data to be stored and / or
retrieved in / from the memory location whose address isretrieved in / from the memory location whose address is
held in the MARheld in the MAR
-2-

Traditional Architecture
Up to 2k addressable
MDR
MAR
Figure Connection of the memory to the processor.
kbit
address bus
nbit
data bus
Control lines
( , MFC, etc.)
Processor Memory
locations
Word length =n bits
WR /
-3-

conceptual internal organization of a
memory chip
-4-

Internal Organization of Memory Chips
FF
Figure Organization of bit cells in a memory chip.
circui
t
Sense / Write
Address
decoder
FF
CS
cells
Memory
circui
t
Sense / Write
Sense / Write
circuit
Data input/output lines:
A0
A1
A2
A3
W0
W1
W15
b7 b1 b0
WR /
b′7 b′1 b′0
b7 b1 b0
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
-5-

size
• 16 words of 8 bits each: 16x8 memory org.. It has16 words of 8 bits each: 16x8 memory org.. It has
16 external connections: addr. 4, data 8, control: 2,16 external connections: addr. 4, data 8, control: 2,
power/ground: 2power/ground: 2
• 1K memory cells: 128x8 memory, external1K memory cells: 128x8 memory, external
connections: ? 19(7+8+2+2)connections: ? 19(7+8+2+2)
• 1Kx1:? 15 (10+1+2+2)1Kx1:? 15 (10+1+2+2)
-6-

Main memory(cont’d)
• Visualize a typical internal main memory structure asVisualize a typical internal main memory structure as
consisting of rows and columns of basic cells.consisting of rows and columns of basic cells.
• Each cell storing one bit of information.Each cell storing one bit of information.
• Cells belonging to a given row can be assumed to form theCells belonging to a given row can be assumed to form the
bits of a givenbits of a given memory word.memory word.
• Address lines AAddress lines An-1n-1 AAn-n-22 ... A... A11 AA00 are used as inputs to theare used as inputs to the
address decoder in order to generate the word select linesaddress decoder in order to generate the word select lines
WW22
n-1n-1
... W... W11 WW00
-7-

Main memory cont’d
• A given word select line is common to all memory cells in the sameA given word select line is common to all memory cells in the same
row.row.
• At any given time ,the address decoder activates only one word selectAt any given time ,the address decoder activates only one word select
line while deactivating the remaining lines.line while deactivating the remaining lines.
• A word select line is used to enable all cells in a row for read or write.A word select line is used to enable all cells in a row for read or write.
• Data (bit) lines are used to input or output the content s of cells.Data (bit) lines are used to input or output the content s of cells.
• Each memory cell is connected to two data lines. A given data line isEach memory cell is connected to two data lines. A given data line is
common to all cells in a given column.common to all cells in a given column.
-8-

What are these “Cells”?
• They are just circuits capable of storing 1 bit of information.They are just circuits capable of storing 1 bit of information.
• In static CMOS technology, each main memory cell consists ofIn static CMOS technology, each main memory cell consists of
two inverters connected back to back(six transistors)two inverters connected back to back(six transistors)
-9-

How the above circuit works?
• If A =1, then transistor NIf A =1, then transistor N22 will be ON and point B=0, which in turnwill be ON and point B=0, which in turn
will cause transistor Pwill cause transistor P11 to be ON , thus causing point A=1. Thisto be ON , thus causing point A=1. This
represents a cell stable state, call it state 1represents a cell stable state, call it state 1
• If in the figure A =0, then transistor N1 will be ON and point B=1,If in the figure A =0, then transistor N1 will be ON and point B=1,
which in turn will cause transistor Pwhich in turn will cause transistor P22 to be ON , thus causing pointto be ON , thus causing point
B=1. This represents a cell stable state, state 2B=1. This represents a cell stable state, state 2
• Transistors NTransistors N33 and Nand N44 connect the cell to the two data (bit) lines.connect the cell to the two data (bit) lines.
• Normally (if the word select is not activated) transistors are turnedNormally (if the word select is not activated) transistors are turned
off, thus protecting the cell from the signal values carried by the dataoff, thus protecting the cell from the signal values carried by the data
lines.lines.
• The two transistors(NThe two transistors(N33 and Nand N44) are turned on when the word select line) are turned on when the word select line
is activated.is activated.
• What takes place when the two transistors are turned on will dependWhat takes place when the two transistors are turned on will depend
on the intended memory operationon the intended memory operation
-10-

Read Operation
1.Both lines b and b are pre-charged high.1.Both lines b and b are pre-charged high.
2. The word select line is activated, thus turning on both2. The word select line is activated, thus turning on both
transistors N3 and N 4 .transistors N3 and N 4 .
3. Depending on the internal value stored in the cell,3. Depending on the internal value stored in the cell,
point A( B) will lead to the discharge of line b ( bb’ ).point A( B) will lead to the discharge of line b ( bb’ ).
-11-

Write operation:
1. The bit lines are pre-charged such that b( b’ ) = 1(0)1. The bit lines are pre-charged such that b( b’ ) = 1(0)
2. The word select line is activated, thus turning on both2. The word select line is activated, thus turning on both
transistors N3 and N 4 .transistors N3 and N 4 .
3. The bit line pre-charged with 0 will have to force the3. The bit line pre-charged with 0 will have to force the
point A (B ), which has1, to 0.point A (B ), which has1, to 0.
-12-

Main draw back of this organization
• Consider, a 1K X 4 memory chipConsider, a 1K X 4 memory chip
• Using the organization shown above the memory arrayUsing the organization shown above the memory array
should be organized as 1K rows of cells, each consisting ofshould be organized as 1K rows of cells, each consisting of
4 cells4 cells
• The chip will then have to have 10 pins for the address andThe chip will then have to have 10 pins for the address and
4 pins for the data.4 pins for the data.
• However, this doesn’t not lead to the best utilization of theHowever, this doesn’t not lead to the best utilization of the
chip areachip area
-13-

A Memory Chip
Figure Organization of a 1K × 1 memory chip.
CS
Sense/Write
circuitry
array
memory cell
address
5bit row
input/output
Data
5bit
decoder
address
5bit column
address
10bit
output multiplexer
32to1
input demultiplexer
32 32×
WR/
W0
W1
W31
and
-14-

Impact of using different organization
• Consider, for example, the design of a memory subsystem whoseConsider, for example, the design of a memory subsystem whose
capacity is 4K bits.capacity is 4K bits.
• Different organization of the same memory capacity can lead to aDifferent organization of the same memory capacity can lead to a
different number of chip pins requirementdifferent number of chip pins requirement
-15-

Example
• Consider, for example, the design of aConsider, for example, the design of a 4M bytes4M bytes main memorymain memory
subsystem usingsubsystem using 1M bit1M bit chipchip
• The number of required chips is 32 chipsThe number of required chips is 32 chips
• Note that the number of address lines required for the 4M system isNote that the number of address lines required for the 4M system is
22, while the number of data lines is 8.22, while the number of data lines is 8.
• The memory subsystem can be arranged in four rows, each havingThe memory subsystem can be arranged in four rows, each having
eight chips.eight chips.
• The least signiﬁcant 20 address lines AThe least signiﬁcant 20 address lines A1919 toto AA00 are used to address anyare used to address any
of the basic building block 1M single bit chips.of the basic building block 1M single bit chips.
• The high- order two address lines AThe high- order two address lines A2121–A–A2020 are used as inputs to a 2 – 4are used as inputs to a 2 – 4
decoderdecoder
-16-

Static Memories
The circuits are capable of retaining their state as long as power isThe circuits are capable of retaining their state as long as power is
appliedapplied
YX
Word line
Bit lines
Figure A static RAM cell.
b
T2T1
b′
-19-

Static Memories
CMOS cell: lCMOS cell: low power consumptionow power consumption
-20-

Asynchronous DRAMs
• Static RAMs are fast, but they cost more area and are moreStatic RAMs are fast, but they cost more area and are more
expensive.expensive.
• Dynamic RAMs (DRAMs) are cheap and area efficient, but they canDynamic RAMs (DRAMs) are cheap and area efficient, but they can
not retain their state indefinitely – need to be periodically refreshed.not retain their state indefinitely – need to be periodically refreshed.
Figure A singletransistor dynamic memory cell
T
C
Word line
Bit line
-21-

Random-Access Memory (RAM)
Key featuresKey features
• RAM is packaged as a chipRAM is packaged as a chip
• Basic storage unit is a cell (one bit per cell)Basic storage unit is a cell (one bit per cell)
• Multiple RAM chips form a memoryMultiple RAM chips form a memory
Static RAM (Static RAM (SRAMSRAM))
• Each cell stores bit with a six-transistor circuitEach cell stores bit with a six-transistor circuit
• Retains value indefinitely, as long as it is kept poweredRetains value indefinitely, as long as it is kept powered
• Relatively insensitive to disturbances such as electrical noiseRelatively insensitive to disturbances such as electrical noise
• Faster and more expensive than DRAMFaster and more expensive than DRAM
Dynamic RAM (Dynamic RAM (DRAMDRAM))
• Each cell stores bit with a capacitor and transistorEach cell stores bit with a capacitor and transistor
• Value must be refreshed every 10-100 msValue must be refreshed every 10-100 ms
• Sensitive to disturbancesSensitive to disturbances
• Slower and cheaper than SRAMSlower and cheaper than SRAM
-22-

Non-Volatile RAM (NVRAM)
Key Feature: Keeps data when powerKey Feature: Keeps data when power
lostlost
 Several types
 Reading assignment(read from your txtbook and from the
web)
-23-

AA busbus is a collection of parallel wires that carryis a collection of parallel wires that carry
address, data, and control signalsaddress, data, and control signals
Buses are typically shared by multiple devicesBuses are typically shared by multiple devices
main
memory
I/O
bridge
bus interface
ALU
register file
CPU chip
system bus memory bus
Typical Bus Structure Connecting CPU and Memory
-28-

CPU places address A on memory busCPU places address A on memory bus
ALU
register file
bus interface
A
0
Ax
main memory
I/O bridge
%eax
Load operation: movl A, %eax
Memory Read Transaction (1)
-29-

Main memory reads A from memory bus, retrievesMain memory reads A from memory bus, retrieves
word x, and places it on busword x, and places it on bus
ALU
register file
bus interface
x 0
Ax
main memory
%eax
I/O bridge
-30-

CPU reads word x from bus and copies it intoCPU reads word x from bus and copies it into
register %eaxregister %eax
x
ALU
register file
bus interface x
main memory
0
A
%eax
I/O bridge
-31-

CPU places address A on bus; main memoryCPU places address A on bus; main memory
reads it and waits for corresponding data wordreads it and waits for corresponding data word
to arriveto arrive
y
ALU
register file
bus interface
A
main memory
0
A
%eax
I/O bridge
Store operation: movl %eax, A
Memory Write Transaction (1)
-32-

CPU places data word y on busCPU places data word y on bus
y
ALU
register file
bus interface
y
main memory
0
A
%eax
I/O bridge
-33-

Main memory reads data word y from bus andMain memory reads data word y from bus and
stores it at address Astores it at address A
y
ALU
register file
bus interface y
main memory
0
A
%eax
I/O bridge
-34-

Principle of Locality:Principle of Locality:
 Programs tend to reuse data and instructions near
those they have used recently, or that were recently
referenced themselves
 Temporal locality: Recently referenced items are likely
to be referenced in the near future
 Spatial locality: Items with nearby addresses tend to
be referenced close together in time
Locality Example:
• Data
– Reference array elements in succession
(stride-1 reference pattern):
– Reference sum each iteration:
• Instructions
– Reference instructions in sequence:
– Cycle through loop repeatedly:
sum = 0;
for (i = 0; i < n; i++)
sum += a[i];
return sum;
Spatial locality
Spatial locality
Temporal locality
Temporal locality
Locality
-35-

Some fundamental and enduring properties ofSome fundamental and enduring properties of
hardware and software:hardware and software:
 Fast storage technologies cost more per byte and have
less capacity
 Gap between CPU and main memory speed is widening
 Well-written programs tend to exhibit good locality
These fundamental properties complement eachThese fundamental properties complement each
other beautifullyother beautifully
They suggest an approach for organizing memoryThey suggest an approach for organizing memory
and storage systems known as aand storage systems known as a memorymemory
hierarchyhierarchy
Memory Hierarchy
-36-

registers
on-chip L1
cache (SRAM)
main memory
(DRAM)
local secondary storage
(local disks)
Larger,
slower,
and
cheaper
(per byte)
storage
devices
remote secondary storage
(distributed file systems, Web servers)
Local disks hold files
retrieved from disks on
remote network servers
Main memory holds disk
blocks retrieved from local
disks
off-chip L2
cache (SRAM)
L1 cache holds cache lines
retrieved from the L2 cache
memory
CPU registers hold words
retrieved from L1 cache
L2 cache holds cache lines
retrieved from main memory
L0:
L1:
L2:
L3:
L4:
L5:
Smaller,
faster,
and
costlier
(per byte)
storage
devices
An Example Memory Hierarchy
-37-

Cache:Cache: Smaller, faster storage device that acts asSmaller, faster storage device that acts as
staging area for subset of data in a larger,staging area for subset of data in a larger,
slower deviceslower device
Fundamental idea of a memory hierarchy:Fundamental idea of a memory hierarchy:
 For each k, the faster, smaller device at level k serves
as cache for larger, slower device at level k+1
Why do memory hierarchies work?Why do memory hierarchies work?
 Programs tend to access data at level k more often
than they access data at level k+1
 Thus, storage at level k+1 can be slower, and thus
larger and cheaper per bit
 Net effect: Large pool of memory that costs as little as
the cheap storage near the bottom, but that serves
data to programs at rate of the fast storage near the≈
top
Caches
-38-

0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
Larger, slower, cheaper storage
device at level k+1 is partitioned
into blocks.
Data is copied between
levels in block-sized transfer
units
8 9 14 3
Smaller, faster, more expensive
device at level k caches a
subset of the blocks from level k+1
Level k:
Level k+1: 4
4
4 10
10
10
Caching in a Memory Hierarchy
-39-

Request
14
Request
12
Program needs object d, which is storedProgram needs object d, which is stored
in some block bin some block b
Cache hitCache hit
 Program finds b in the cache atProgram finds b in the cache at
level k. E.g., block 14level k. E.g., block 14
Cache missCache miss
 b is not at level k, so level k cacheb is not at level k, so level k cache
must fetch it from level k+1.must fetch it from level k+1.
E.g., block 12E.g., block 12
 If level k cache is full, then someIf level k cache is full, then some
current block must be replacedcurrent block must be replaced
(evicted). Which one is the(evicted). Which one is the
“victim”?“victim”?
 Placement policy: where can thewhere can the
new block go? E.g., b mod 4new block go? E.g., b mod 4
 Replacement policy: which blockwhich block
9 3
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
Level
k:
Level
k+1:
1414
12
14
4*
4*12
12
0 1 2 3
Request
12
4*4*12
General Caching Concepts
-40-

• Cache memory is intended to give memory speedCache memory is intended to give memory speed
approaching that of the fastest memoriesapproaching that of the fastest memories
availableavailable
Cache Memory Principles
-42-

• When the processor attempts to read a word of memory, a check isWhen the processor attempts to read a word of memory, a check is
made to determine if the word is in the cache. If so, the word ismade to determine if the word is in the cache. If so, the word is
delivered to the processordelivered to the processor
• If not, a block of main memory, consisting of some fixed numberIf not, a block of main memory, consisting of some fixed number
of words, is read into the cache and then the word is delivered toof words, is read into the cache and then the word is delivered to
the processorthe processor
• Because of the phenomenon of locality of reference, when a blockBecause of the phenomenon of locality of reference, when a block
of data is fetched into the cache to satisfy a single memoryof data is fetched into the cache to satisfy a single memory
reference, it is likely that there will be future references to thatreference, it is likely that there will be future references to that
same memory location or to other words in the block.same memory location or to other words in the block.
Cache Memory Principles(cont’d)
-43-

Cache Memory principles(cont’d)
-44-

• Main memory consists of up to 2Main memory consists of up to 2nn
addressable words, with eachaddressable words, with each
word having a unique n-bit address.word having a unique n-bit address.
• For mapping purposes, this memory is considered to consist of aFor mapping purposes, this memory is considered to consist of a
number of fixed-length blocks of K words each.number of fixed-length blocks of K words each.
That is, there areThat is, there are M =2M =2nn
/K/K blocks in main memory.blocks in main memory.
• The cache consists of m blocks, calledThe cache consists of m blocks, called lineslines
• NOTE:NOTE: In referring to the basic unit of the cache, the termIn referring to the basic unit of the cache, the term lineline isis
used, rather than the termused, rather than the term blockblock
• Each line containsEach line contains KK words, plus a tag of a few bitswords, plus a tag of a few bits
• Each line also includes control bits (not shown), such as a bit toEach line also includes control bits (not shown), such as a bit to
indicate whether the line has been modified since being loaded intoindicate whether the line has been modified since being loaded into
the cachethe cache
Cache memory principles(cont’d)
-45-

• The number of lines is considerably less than the number of mainThe number of lines is considerably less than the number of main
memory blocks(m<=M)memory blocks(m<=M)
• If a word in a block of memory is read, that block is transferred toIf a word in a block of memory is read, that block is transferred to
one of the lines of the cache.one of the lines of the cache.
• Because there are more blocks than lines, an individual line can-notBecause there are more blocks than lines, an individual line can-not
be uniquely and permanently dedicated to a particular blockbe uniquely and permanently dedicated to a particular block
• Thus, each line includes aThus, each line includes a tagtag that identifies which particular blockthat identifies which particular block
is currently being stored.is currently being stored.
• TheThe tagtag is usually a portion of the main memory addressis usually a portion of the main memory address
Cache memory principles(cont’d)
-46-

Elements of cache DesignElements of cache Design
-48-

• AA logical cachelogical cache, also known as a virtual cache, stores data using, also known as a virtual cache, stores data using
virtual addressesvirtual addresses
• The processor accesses the cache directly,without going through theThe processor accesses the cache directly,without going through the
MMU(Memory management Unit)MMU(Memory management Unit)
• AdvantageAdvantage of the logical cache is thatof the logical cache is that cache access speed is fastercache access speed is faster
than for a physical cache, because the cache can respond before thethan for a physical cache, because the cache can respond before the
MMU performs an address translationMMU performs an address translation
• DisadvantageDisadvantage? same virtual address in two different applications? same virtual address in two different applications
refers to two different physical addressesrefers to two different physical addresses
Cache Addresses(Logical)
-49-

• A physical cache stores data using main memory physicalA physical cache stores data using main memory physical
addressesaddresses
• The subject of logical versus physical cache is a complex one,The subject of logical versus physical cache is a complex one,
and beyond the scope of this bookand beyond the scope of this book
Cache Addresses(Physical)
-50-

0 1 2 3 4 5 6 70 1 2 3Set Number
Cache
Fully (2-way) Set Direct
Associative Associative Mapped
anywhere anywhere in only into
set 0 block 4
(12 mod 4) (12 mod 8)
0 1 2 3 4 5 6 7 8 9
1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9
2 2 2 2 2 2 2 2 2 2
0 1 2 3 4 5 6 7 8 9
3 3
0 1
Memory
Block Number
block 12
can be placed
Mapping functionsMapping functions
-51-

Main Memory
Direct Mapping Cache Organization
Tag
0
1
2
3
Cache
Indicates block #
Block #0
Block #1
Block #2
Block #3
Block #4
Block #5
Block #6
Block #7
Block #8
Block #9
Block #10
Block #11
Block #12
Block #13
Block #14
Block #15
Direct-Mapping Cache Organization
Destination = (MemBlock Number) modulo
(Total number of CacheBlocks)
(Size = Log2((Total number of
MemBlocks)/(Total number of
CacheBlocks)))
Block #1
Block #2
Block #3
Block #4
Tag Size = log2(16/4)
= log2(4)
0 modulo 4
= 0
-52-

Direct mapping(cont’d)
Compare
Main Memory
Block WordTag
s-r
0
1
2
3
Cache Memory
w
s-r
r
w
s w
s
w
s = Number of wires for row address
w = Number of wires for column address
s-r = log2(Memblocks/CacheBlock)
= Switch
1
1
Miss Hit
(s+w)
-53-

•Address length (s+w) bits
• Number of addressable units =2s+w
words or
bytes
• Block size= line size= 2w
words or bytes
• Number of blocks in main memory (2s+w
)/(2w
)
• Number of lines in cache= m= 2r
• Size of cache 2r+w
words or bytes
• Size of tag (s-r) bit
Direct mapping(cont’d)
-54-

Full-Associative Cache Organization
Tag
0
1
2
3
Cache
Indicates block #
Main Memory
Block #0
Block #1
Block #2
Block #3
Block #4
Block #5
Block #6
Block #7
Block #8
Block #9
Block #10
Block #11
Block #12
Block #13
Block #14
Block #15
Full-Associate Cache Organization
(Size = Log2(Memory Block#))
A memory block can go
to any cache block
-55-

Full Associative Cache Organization(cont’d)
Compare
WordTag
Main Memory
0
1
2
3
Cache Memory
w
s
s+w
s
Log2(CacheSize)
s
w
s = log2(# of MemoryBlocks)
1
1
Miss
Hit
w
-56-

Set Associative Cache Organization
Set-Associative Cache Organization
0
1
0
1
0
1
0
1
0
1
2
3
“Set Number”
(two-way set associative)
Block #0
Block #1
Block #2
Block #3
Block #4
Block #5
Block #6
Block #7
Block #8
Block #9
Block #10
Block #11
Block #12
Block #13
Block #14
Block #15
Main Memory
Block #0
Block #0
Block #0
Block #0
Block #1
Block #1
Block #1
Block #1
Destination = (Memory Block) modulo (# of sets)
-57-

Set Associative Cache Organization
Compare
Main Memory
w
s-d
WordTag Set
0
2
3
Cache Memory
1
d
s-d
s-d = ???
s
w
s+w
s w
1
1
Miss
Hit
w
-58-

Replacement Policy
•In an associative cache, which block from a set should be
evicted when the set becomes full?
• Random
• Least-Recently Used (LRU)
• LRU cache state must be updated on every access
• true implementation only feasible for small sets (2-way)
• pseudo-LRU binary tree often used for 4-8 way
• First-In, First-Out (FIFO)
• used in highly associative caches
• Not-Most-Recently Used (NMRU)
• FIFO with exception for most-recently used block or blocks
This is a second-order effect. Why? Because Replacement only
happens on misses
-59-

THE MEMORY HIERARCHY(revisitedTHE MEMORY HIERARCHY(revisited)
Memory in a computer system is arranged in aMemory in a computer system is arranged in a
hierarchy, as shown in Figurehierarchy, as shown in Figure
-61-

• At the top,we have primary storage, whichAt the top,we have primary storage, which
consists of cache and main memory, andconsists of cache and main memory, and
provides very fast access to data.provides very fast access to data.
• Then comes secondary storage, which consistsThen comes secondary storage, which consists
of slower devices such as magnetic disks.of slower devices such as magnetic disks.
• Tertiary storage is the slowest class of storageTertiary storage is the slowest class of storage
devices;devices;
• for example, optical disks and tapesfor example, optical disks and tapes
-62-

• Slower storage devices such as tapes and disksSlower storage devices such as tapes and disks
play an important role in database systemsplay an important role in database systems
because the amount of data is typically verybecause the amount of data is typically very
largelarge
• Since buying enough main memory to store allSince buying enough main memory to store all
data is prohibitively expensive, we must storedata is prohibitively expensive, we must store
data on tapes and disks and build databasedata on tapes and disks and build database
systems that can retrieve data from lowersystems that can retrieve data from lower
levels of the memory hierarchy into mainlevels of the memory hierarchy into main
memory as needed for processing.memory as needed for processing.
-63-

Why storing data on secondary memory?Why storing data on secondary memory?
COSTCOST
• The cost of a given amount of main memory is about 100The cost of a given amount of main memory is about 100
times the cost of the same amount of disk space, and tapestimes the cost of the same amount of disk space, and tapes
are even less expensive than disks.are even less expensive than disks.
PRIMARY STORAGE IS USUALLY VOLATILEPRIMARY STORAGE IS USUALLY VOLATILE
• Main memory size is very small but the number of dataMain memory size is very small but the number of data
objects may exceed this number!objects may exceed this number!
Further, data must be maintained across program executions.Further, data must be maintained across program executions.
This requires storage devices that retain information whenThis requires storage devices that retain information when
the computer is restarted (after a shutdown or a crash); wethe computer is restarted (after a shutdown or a crash); we
call such storage nonvolatilecall such storage nonvolatile
-64-

Primary storage is usually volatile (although itPrimary storage is usually volatile (although it
is possible to make it nonvolatile by adding ais possible to make it nonvolatile by adding a
battery backup feature), whereas secondarybattery backup feature), whereas secondary
and tertiary storage is nonvolatile.and tertiary storage is nonvolatile.
Tapes are relatively inexpensive and can storeTapes are relatively inexpensive and can store
very large amounts of data.very large amounts of data.
 They are a good choice for archival storage,They are a good choice for archival storage,
i.e, when we need to maintain data for a longi.e, when we need to maintain data for a long
period but do not expect to access it veryperiod but do not expect to access it very
often.often.
-65-

DRAWBACKS OF TAPESDRAWBACKS OF TAPES
 they are sequential access devicesthey are sequential access devices
We must essentially step through all the data inWe must essentially step through all the data in
order and cannot directly access a given locationorder and cannot directly access a given location
on tapeon tape
 This makes tapes unsuitable for storing operationalThis makes tapes unsuitable for storing operational
data , or data that is frequently accessed.data , or data that is frequently accessed.
 Tapes are mostly used to back up operational dataTapes are mostly used to back up operational data
periodicallyperiodically..
-66-

Magnetic disksMagnetic disks
• support direct access to a desired locationsupport direct access to a desired location
• widely used for database applicationswidely used for database applications
• A DBMS provides seamless access to data on disk;A DBMS provides seamless access to data on disk;
• applications need not worry about whether data isapplications need not worry about whether data is
in main memory or diskin main memory or disk
-67-

Structure of a DiskStructure of a Disk
-68-

Disks(cont’d)Disks(cont’d)
Data is stored on disk in units called disk blocksData is stored on disk in units called disk blocks
A disk block is a contiguous sequence of bytes and isA disk block is a contiguous sequence of bytes and is
the unit in which data is written to a disk and readthe unit in which data is written to a disk and read
from a diskfrom a disk
Blocks are arranged in concentric rings called tracks,Blocks are arranged in concentric rings called tracks,
on one or more platterson one or more platters
Tracks can be recorded on one or both surfaces of aTracks can be recorded on one or both surfaces of a
platter; we refer to platters as single-sided or double-platter; we refer to platters as single-sided or double-
sided accordinglysided accordingly
-69-

• The set of all tracks with the same diameter is called aThe set of all tracks with the same diameter is called a
cylindercylinder, because the space occupied by these tracks is, because the space occupied by these tracks is
shaped like a cylinder;shaped like a cylinder;
• Each track is divided into arcs calledEach track is divided into arcs called sectorssectors, whose size is, whose size is
a characteristic of the disk and cannot be changeda characteristic of the disk and cannot be changed
• An array of disk heads, one per recorded surface, is movedAn array of disk heads, one per recorded surface, is moved
as a unit; when one head is positioned over a block, theas a unit; when one head is positioned over a block, the
other heads are in identical positions with respect to theirother heads are in identical positions with respect to their
plattersplatters
-70-

• To read or write a block, a disk head must be positioned onTo read or write a block, a disk head must be positioned on
top of the blocktop of the block
• As the size of a platter decreases, seek times also decreaseAs the size of a platter decreases, seek times also decrease
since we have to move a disk head a smaller distance.since we have to move a disk head a smaller distance.
• Typical platter diameters are 3.5 inches and 5.25 inchesTypical platter diameters are 3.5 inches and 5.25 inches
• A disk controller interfaces a disk drive to the computer. ItA disk controller interfaces a disk drive to the computer. It
implements commands to read or write a sector by movingimplements commands to read or write a sector by moving
the arm assembly and transferring data to and from the diskthe arm assembly and transferring data to and from the disk
surfacessurfaces
-71-

Access time to a diskAccess time to a disk
• The time to access a disk block has several componentsThe time to access a disk block has several components..
SeekSeek timetime is the time taken to move the disk heads to theis the time taken to move the disk heads to the
track on which a desired block is located.track on which a desired block is located.
Rotational delayRotational delay is the waiting time for the desiredis the waiting time for the desired
block to rotate under the disk head;block to rotate under the disk head;
it is the time required for half a rotation on average and isit is the time required for half a rotation on average and is
usually less than seek timeusually less than seek time
-72-

Access time to a disk(cont’d)Access time to a disk(cont’d)
Transfer timeTransfer time is the time to actually read or write the data inis the time to actually read or write the data in
the block once the head is positioned, that is, the timethe block once the head is positioned, that is, the time
for the disk to rotate over the block.for the disk to rotate over the block.
Example The IBM Deskstar 14GPXExample The IBM Deskstar 14GPX
- it is a 3.5 inch diameter,- it is a 3.5 inch diameter,
-14.4 GB hard disk,-14.4 GB hard disk,
-average seek time of 9.1milliseconds (msec)-average seek time of 9.1milliseconds (msec)
- verage rotational delay of 4.17 msec.- verage rotational delay of 4.17 msec.
-seek time from one track to the next is just 2.2 ms-seek time from one track to the next is just 2.2 ms
-73-

Performance Implications of Disk StructurePerformance Implications of Disk Structure
1.1. The unit for data transfer between disk and mainThe unit for data transfer between disk and main
memory is a block;memory is a block;
if a single item on a block is needed, the entire block isif a single item on a block is needed, the entire block is
transferred.transferred.
Reading or writing a disk block is called an I/O (forReading or writing a disk block is called an I/O (for
input/output) operation.input/output) operation.
2. The time to read or write a block varies, depending on2. The time to read or write a block varies, depending on
the location of the datathe location of the data
access time = seek time + rotational delay +access time = seek time + rotational delay +
transfer timetransfer time-74-

Virtual MemoryVirtual Memory
Use main memory as a “cache” for secondary (disk)Use main memory as a “cache” for secondary (disk)
storagestorage
 Managed jointly by CPU hardware and the operatingManaged jointly by CPU hardware and the operating
system (OS)system (OS)
Programs share main memoryPrograms share main memory
 Each gets a private virtual address space holding itsEach gets a private virtual address space holding its
frequently used code and datafrequently used code and data
 Protected from other programsProtected from other programs
CPU and OS translate virtual addresses to physicalCPU and OS translate virtual addresses to physical
addressesaddresses
 VM “block” is called a pageVM “block” is called a page
 VM translation “miss” is called a page faultVM translation “miss” is called a page fault
-75-

Address TranslationAddress Translation
Fixed-size pages (e.g., 4K)Fixed-size pages (e.g., 4K)
-76-

Page Fault PenaltyPage Fault Penalty
On page fault, the page must be fetched from diskOn page fault, the page must be fetched from disk
 Takes millions of clock cyclesTakes millions of clock cycles
 Handled by OS codeHandled by OS code
Try to minimize page fault rateTry to minimize page fault rate
 Fully associative placementFully associative placement
 Smart replacement algorithmsSmart replacement algorithms
-77-

Page TablesPage Tables
Stores placement informationStores placement information
 Array of page table entries, indexed by virtual pageArray of page table entries, indexed by virtual page
numbernumber
 Page table register in CPU points to page table inPage table register in CPU points to page table in
physical memoryphysical memory
If page is present in memoryIf page is present in memory
 Page Table Entry stores the physical page numberPage Table Entry stores the physical page number
 Plus other status bits (referenced, dirty, …)Plus other status bits (referenced, dirty, …)
If page is not presentIf page is not present
 Page Table Entry can refer to location in swap spacePage Table Entry can refer to location in swap space
on diskon disk
-78-

Translation Using a Page TableTranslation Using a Page Table
-79-

Mapping Pages to StorageMapping Pages to Storage
-80-

Replacement and WritesReplacement and Writes
To reduce page fault rate, prefer least-recently usedTo reduce page fault rate, prefer least-recently used
(LRU) replacement(LRU) replacement
 Reference bit (aka use bit) in PTE set to 1 on access toReference bit (aka use bit) in PTE set to 1 on access to
pagepage
 Periodically cleared to 0 by OSPeriodically cleared to 0 by OS
 A page with reference bit = 0 has not been used recentlyA page with reference bit = 0 has not been used recently
Disk writes take millions of cyclesDisk writes take millions of cycles
 Block at once, not individual locationsBlock at once, not individual locations
 Write through is impracticalWrite through is impractical
 Use write-backUse write-back
 Dirty bit in PTE set when page is writtenDirty bit in PTE set when page is written
-81-

Fast Translation Using a TLBFast Translation Using a TLB
Address translation would appear to requireAddress translation would appear to require
extra memory referencesextra memory references
 One to access the PTEOne to access the PTE
 Then the actual memory accessThen the actual memory access
But access to page tables has good localityBut access to page tables has good locality
 So use a fast cache of PTEs within the CPU Called aSo use a fast cache of PTEs within the CPU Called a
Translation Look-aside Buffer (TLB)Translation Look-aside Buffer (TLB)
 Typical: 16–512 PTEs, 0.5–1 cycle for hit, 10–100 cyclesTypical: 16–512 PTEs, 0.5–1 cycle for hit, 10–100 cycles
for miss, 0.01%–1% miss ratefor miss, 0.01%–1% miss rate
 Misses could be handled by hardware or softwareMisses could be handled by hardware or software
-82-

Fast Translation Using a TLBFast Translation Using a TLB
-83-

TLB MissesTLB Misses
If page is in memoryIf page is in memory
 Load the PTE from memory and retryLoad the PTE from memory and retry
 Could be handled in hardwareCould be handled in hardware
 Can get complex for more complicated page tableCan get complex for more complicated page table
structuresstructures
 Or in softwareOr in software
 Raise a special exception, with optimized handlerRaise a special exception, with optimized handler
If page is not in memory (page fault)If page is not in memory (page fault)
 OS handles fetching the page and updating the pageOS handles fetching the page and updating the page
tabletable
 Then restart the faulting instructionThen restart the faulting instruction
-85-

TLB Miss HandlerTLB Miss Handler
TLB miss indicatesTLB miss indicates
 Page present, but PTE not in TLBPage present, but PTE not in TLB
 Page not presetPage not preset
Must recognize TLB miss before destinationMust recognize TLB miss before destination
register overwrittenregister overwritten
 Raise exceptionRaise exception
Handler copies PTE from memory to TLBHandler copies PTE from memory to TLB
 Then restarts instructionThen restarts instruction
 If page not present, page fault will occurIf page not present, page fault will occur
-85-

Page Fault HandlerPage Fault Handler
Use faulting virtual address to find PTEUse faulting virtual address to find PTE
Locate page on diskLocate page on disk
Choose page to replaceChoose page to replace
 If dirty, write to disk first
Read page into memory and update page tableRead page into memory and update page table
Make process runnable againMake process runnable again
 Restart from faulting instruction
-86-

TLB and Cache Interaction
If cache tag usesIf cache tag uses
physical addressphysical address
 Need to translateNeed to translate
before cachebefore cache
lookuplookup
Alternative: useAlternative: use
virtual address tagvirtual address tag
 ComplicationsComplications
due to aliasingdue to aliasing
 Different virtualDifferent virtual
addresses foraddresses for
shared physicalshared physical
addressaddress
-87-

Memory systems n

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Memory systems n

Similaire à Memory systems n (20)

Dernier

Dernier (20)

Memory systems n