SlideShare une entreprise Scribd logo
1  sur  83
The Memory Hierarchy
TopicsTopics
 Storage technologies and trends
 Locality of reference
 Caching in the memory hierarchy
Memory Systems
-1-
Main Memory
• Main memory provides the main storage for a computer.Main memory provides the main storage for a computer.
• Two CPU registers are used to interface the CPU to theTwo CPU registers are used to interface the CPU to the
main memory. These aremain memory. These are
the memory address register (MA R) andthe memory address register (MA R) and
the memory data register (MDR).the memory data register (MDR).
• The MDR is used to hold the data to be stored and / orThe MDR is used to hold the data to be stored and / or
retrieved in / from the memory location whose address isretrieved in / from the memory location whose address is
held in the MARheld in the MAR
-2-
Traditional Architecture
Up to 2k  addressable
MDR
MAR
Figure  Connection of the memory to the processor.
k­bit
address bus
n­bit
data bus
Control lines
(          , MFC, etc.)
Processor  Memory
locations
Word length =n bits
WR /
-3-
conceptual internal organization of a
memory chip
-4-
Internal Organization of Memory Chips
FF
Figure Organization of bit cells in a memory chip.
circui
t
Sense / Write
Address
decoder
FF
CS
cells
Memory
circui
t
Sense / Write
Sense / Write
circuit
Data input/output lines:
A0
A1
A2
A3
W0
W1
W15
b7 b1 b0
WR /
b′7 b′1 b′0
b7 b1 b0
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
-5-
size
• 16 words of 8 bits each: 16x8 memory org.. It has16 words of 8 bits each: 16x8 memory org.. It has
16 external connections: addr. 4, data 8, control: 2,16 external connections: addr. 4, data 8, control: 2,
power/ground: 2power/ground: 2
• 1K memory cells: 128x8 memory, external1K memory cells: 128x8 memory, external
connections: ? 19(7+8+2+2)connections: ? 19(7+8+2+2)
• 1Kx1:? 15 (10+1+2+2)1Kx1:? 15 (10+1+2+2)
-6-
Main memory(cont’d)
• Visualize a typical internal main memory structure asVisualize a typical internal main memory structure as
consisting of rows and columns of basic cells.consisting of rows and columns of basic cells.
• Each cell storing one bit of information.Each cell storing one bit of information.
• Cells belonging to a given row can be assumed to form theCells belonging to a given row can be assumed to form the
bits of a givenbits of a given memory word.memory word.
• Address lines AAddress lines An-1n-1 AAn-n-22 ... A... A11 AA00 are used as inputs to theare used as inputs to the
address decoder in order to generate the word select linesaddress decoder in order to generate the word select lines
WW22
n-1n-1
... W... W11 WW00
-7-
Main memory cont’d
• A given word select line is common to all memory cells in the sameA given word select line is common to all memory cells in the same
row.row.
• At any given time ,the address decoder activates only one word selectAt any given time ,the address decoder activates only one word select
line while deactivating the remaining lines.line while deactivating the remaining lines.
• A word select line is used to enable all cells in a row for read or write.A word select line is used to enable all cells in a row for read or write.
• Data (bit) lines are used to input or output the content s of cells.Data (bit) lines are used to input or output the content s of cells.
• Each memory cell is connected to two data lines. A given data line isEach memory cell is connected to two data lines. A given data line is
common to all cells in a given column.common to all cells in a given column.
-8-
What are these “Cells”?
• They are just circuits capable of storing 1 bit of information.They are just circuits capable of storing 1 bit of information.
• In static CMOS technology, each main memory cell consists ofIn static CMOS technology, each main memory cell consists of
two inverters connected back to back(six transistors)two inverters connected back to back(six transistors)
-9-
How the above circuit works?
• If A =1, then transistor NIf A =1, then transistor N22 will be ON and point B=0, which in turnwill be ON and point B=0, which in turn
will cause transistor Pwill cause transistor P11 to be ON , thus causing point A=1. Thisto be ON , thus causing point A=1. This
represents a cell stable state, call it state 1represents a cell stable state, call it state 1
• If in the figure A =0, then transistor N1 will be ON and point B=1,If in the figure A =0, then transistor N1 will be ON and point B=1,
which in turn will cause transistor Pwhich in turn will cause transistor P22 to be ON , thus causing pointto be ON , thus causing point
B=1. This represents a cell stable state, state 2B=1. This represents a cell stable state, state 2
• Transistors NTransistors N33 and Nand N44 connect the cell to the two data (bit) lines.connect the cell to the two data (bit) lines.
• Normally (if the word select is not activated) transistors are turnedNormally (if the word select is not activated) transistors are turned
off, thus protecting the cell from the signal values carried by the dataoff, thus protecting the cell from the signal values carried by the data
lines.lines.
• The two transistors(NThe two transistors(N33 and Nand N44) are turned on when the word select line) are turned on when the word select line
is activated.is activated.
• What takes place when the two transistors are turned on will dependWhat takes place when the two transistors are turned on will depend
on the intended memory operationon the intended memory operation
-10-
Read Operation
1.Both lines b and b are pre-charged high.1.Both lines b and b are pre-charged high.
2. The word select line is activated, thus turning on both2. The word select line is activated, thus turning on both
transistors N3 and N 4 .transistors N3 and N 4 .
3. Depending on the internal value stored in the cell,3. Depending on the internal value stored in the cell,
point A( B) will lead to the discharge of line b ( bb’ ).point A( B) will lead to the discharge of line b ( bb’ ).
-11-
Write operation:
1. The bit lines are pre-charged such that b( b’ ) = 1(0)1. The bit lines are pre-charged such that b( b’ ) = 1(0)
2. The word select line is activated, thus turning on both2. The word select line is activated, thus turning on both
transistors N3 and N 4 .transistors N3 and N 4 .
3. The bit line pre-charged with 0 will have to force the3. The bit line pre-charged with 0 will have to force the
point A (B ), which has1, to 0.point A (B ), which has1, to 0.
-12-
Main draw back of this organization
• Consider, a 1K X 4 memory chipConsider, a 1K X 4 memory chip
• Using the organization shown above the memory arrayUsing the organization shown above the memory array
should be organized as 1K rows of cells, each consisting ofshould be organized as 1K rows of cells, each consisting of
4 cells4 cells
• The chip will then have to have 10 pins for the address andThe chip will then have to have 10 pins for the address and
4 pins for the data.4 pins for the data.
• However, this doesn’t not lead to the best utilization of theHowever, this doesn’t not lead to the best utilization of the
chip areachip area
-13-
A Memory Chip
Figure   Organization of a 1K × 1 memory chip.
CS
Sense/Write
circuitry
array
memory cell
address
5­bit row
input/output
Data
5­bit
decoder
address
5­bit column
address
10­bit
output multiplexer
 32­to­1
input demultiplexer
32 32×
WR/
W0
W1
W31
and
-14-
Impact of using different organization
• Consider, for example, the design of a memory subsystem whoseConsider, for example, the design of a memory subsystem whose
capacity is 4K bits.capacity is 4K bits.
• Different organization of the same memory capacity can lead to aDifferent organization of the same memory capacity can lead to a
different number of chip pins requirementdifferent number of chip pins requirement
-15-
Example
• Consider, for example, the design of aConsider, for example, the design of a 4M bytes4M bytes main memorymain memory
subsystem usingsubsystem using 1M bit1M bit chipchip
• The number of required chips is 32 chipsThe number of required chips is 32 chips
• Note that the number of address lines required for the 4M system isNote that the number of address lines required for the 4M system is
22, while the number of data lines is 8.22, while the number of data lines is 8.
• The memory subsystem can be arranged in four rows, each havingThe memory subsystem can be arranged in four rows, each having
eight chips.eight chips.
• The least significant 20 address lines AThe least significant 20 address lines A1919 toto AA00 are used to address anyare used to address any
of the basic building block 1M single bit chips.of the basic building block 1M single bit chips.
• The high- order two address lines AThe high- order two address lines A2121–A–A2020 are used as inputs to a 2 – 4are used as inputs to a 2 – 4
decoderdecoder
-16-
Example(cont’d)
-17-
Example(cont’d)
-18-
Static Memories
The circuits are capable of retaining their state as long as power isThe circuits are capable of retaining their state as long as power is
appliedapplied
YX
Word line
Bit lines
Figure   A static RAM cell.
b
T2T1
b′
-19-
Static Memories
CMOS cell: lCMOS cell: low power consumptionow power consumption
-20-
Asynchronous DRAMs
• Static RAMs are fast, but they cost more area and are moreStatic RAMs are fast, but they cost more area and are more
expensive.expensive.
• Dynamic RAMs (DRAMs) are cheap and area efficient, but they canDynamic RAMs (DRAMs) are cheap and area efficient, but they can
not retain their state indefinitely – need to be periodically refreshed.not retain their state indefinitely – need to be periodically refreshed.
Figure  A single­transistor dynamic memory cell 
T
C
Word line
Bit line
-21-
Random-Access Memory (RAM)
Key featuresKey features
• RAM is packaged as a chipRAM is packaged as a chip
• Basic storage unit is a cell (one bit per cell)Basic storage unit is a cell (one bit per cell)
• Multiple RAM chips form a memoryMultiple RAM chips form a memory
Static RAM (Static RAM (SRAMSRAM))
• Each cell stores bit with a six-transistor circuitEach cell stores bit with a six-transistor circuit
• Retains value indefinitely, as long as it is kept poweredRetains value indefinitely, as long as it is kept powered
• Relatively insensitive to disturbances such as electrical noiseRelatively insensitive to disturbances such as electrical noise
• Faster and more expensive than DRAMFaster and more expensive than DRAM
Dynamic RAM (Dynamic RAM (DRAMDRAM))
• Each cell stores bit with a capacitor and transistorEach cell stores bit with a capacitor and transistor
• Value must be refreshed every 10-100 msValue must be refreshed every 10-100 ms
• Sensitive to disturbancesSensitive to disturbances
• Slower and cheaper than SRAMSlower and cheaper than SRAM
-22-
Non-Volatile RAM (NVRAM)
Key Feature: Keeps data when powerKey Feature: Keeps data when power
lostlost
 Several types
 Reading assignment(read from your txtbook and from the
web)
-23-
AA busbus is a collection of parallel wires that carryis a collection of parallel wires that carry
address, data, and control signalsaddress, data, and control signals
Buses are typically shared by multiple devicesBuses are typically shared by multiple devices
main
memory
I/O
bridge
bus interface
ALU
register file
CPU chip
system bus memory bus
Typical Bus Structure Connecting CPU and Memory
-28-
CPU places address A on memory busCPU places address A on memory bus
ALU
register file
bus interface
A
0
Ax
main memory
I/O bridge
%eax
Load operation: movl A, %eax
Memory Read Transaction (1)
-29-
Main memory reads A from memory bus, retrievesMain memory reads A from memory bus, retrieves
word x, and places it on busword x, and places it on bus
ALU
register file
bus interface
x 0
Ax
main memory
%eax
I/O bridge
Load operation: movl A, %eax
Memory Read Transaction (2)
-30-
CPU reads word x from bus and copies it intoCPU reads word x from bus and copies it into
register %eaxregister %eax
x
ALU
register file
bus interface x
main memory
0
A
%eax
I/O bridge
Load operation: movl A, %eax
Memory Read Transaction (3)
-31-
CPU places address A on bus; main memoryCPU places address A on bus; main memory
reads it and waits for corresponding data wordreads it and waits for corresponding data word
to arriveto arrive
y
ALU
register file
bus interface
A
main memory
0
A
%eax
I/O bridge
Store operation: movl %eax, A
Memory Write Transaction (1)
-32-
CPU places data word y on busCPU places data word y on bus
y
ALU
register file
bus interface
y
main memory
0
A
%eax
I/O bridge
Store operation: movl %eax, A
Memory Write Transaction (2)
-33-
Main memory reads data word y from bus andMain memory reads data word y from bus and
stores it at address Astores it at address A
y
ALU
register file
bus interface y
main memory
0
A
%eax
I/O bridge
Store operation: movl %eax, A
Memory Write Transaction (3)
-34-
Principle of Locality:Principle of Locality:
 Programs tend to reuse data and instructions near
those they have used recently, or that were recently
referenced themselves
 Temporal locality: Recently referenced items are likely
to be referenced in the near future
 Spatial locality: Items with nearby addresses tend to
be referenced close together in time
Locality Example:
• Data
– Reference array elements in succession
(stride-1 reference pattern):
– Reference sum each iteration:
• Instructions
– Reference instructions in sequence:
– Cycle through loop repeatedly:
sum = 0;
for (i = 0; i < n; i++)
sum += a[i];
return sum;
Spatial locality
Spatial locality
Temporal locality
Temporal locality
Locality
-35-
Some fundamental and enduring properties ofSome fundamental and enduring properties of
hardware and software:hardware and software:
 Fast storage technologies cost more per byte and have
less capacity
 Gap between CPU and main memory speed is widening
 Well-written programs tend to exhibit good locality
These fundamental properties complement eachThese fundamental properties complement each
other beautifullyother beautifully
They suggest an approach for organizing memoryThey suggest an approach for organizing memory
and storage systems known as aand storage systems known as a memorymemory
hierarchyhierarchy
Memory Hierarchy
-36-
registers
on-chip L1
cache (SRAM)
main memory
(DRAM)
local secondary storage
(local disks)
Larger,
slower,
and
cheaper
(per byte)
storage
devices
remote secondary storage
(distributed file systems, Web servers)
Local disks hold files
retrieved from disks on
remote network servers
Main memory holds disk
blocks retrieved from local
disks
off-chip L2
cache (SRAM)
L1 cache holds cache lines
retrieved from the L2 cache
memory
CPU registers hold words
retrieved from L1 cache
L2 cache holds cache lines
retrieved from main memory
L0:
L1:
L2:
L3:
L4:
L5:
Smaller,
faster,
and
costlier
(per byte)
storage
devices
An Example Memory Hierarchy
-37-
Cache:Cache: Smaller, faster storage device that acts asSmaller, faster storage device that acts as
staging area for subset of data in a larger,staging area for subset of data in a larger,
slower deviceslower device
Fundamental idea of a memory hierarchy:Fundamental idea of a memory hierarchy:
 For each k, the faster, smaller device at level k serves
as cache for larger, slower device at level k+1
Why do memory hierarchies work?Why do memory hierarchies work?
 Programs tend to access data at level k more often
than they access data at level k+1
 Thus, storage at level k+1 can be slower, and thus
larger and cheaper per bit
 Net effect: Large pool of memory that costs as little as
the cheap storage near the bottom, but that serves
data to programs at rate of the fast storage near the≈
top
Caches
-38-
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
Larger, slower, cheaper storage
device at level k+1 is partitioned
into blocks.
Data is copied between
levels in block-sized transfer
units
8 9 14 3
Smaller, faster, more expensive
device at level k caches a
subset of the blocks from level k+1
Level k:
Level k+1: 4
4
4 10
10
10
Caching in a Memory Hierarchy
-39-
Request
14
Request
12
Program needs object d, which is storedProgram needs object d, which is stored
in some block bin some block b
Cache hitCache hit
 Program finds b in the cache atProgram finds b in the cache at
level k. E.g., block 14level k. E.g., block 14
Cache missCache miss
 b is not at level k, so level k cacheb is not at level k, so level k cache
must fetch it from level k+1.must fetch it from level k+1.
E.g., block 12E.g., block 12
 If level k cache is full, then someIf level k cache is full, then some
current block must be replacedcurrent block must be replaced
(evicted). Which one is the(evicted). Which one is the
“victim”?“victim”?
 Placement policy: where can thewhere can the
new block go? E.g., b mod 4new block go? E.g., b mod 4
 Replacement policy: which blockwhich block
9 3
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
Level
k:
Level
k+1:
1414
12
14
4*
4*12
12
0 1 2 3
Request
12
4*4*12
General Caching Concepts
-40-
General Caching Concepts
-41-
• Cache memory is intended to give memory speedCache memory is intended to give memory speed
approaching that of the fastest memoriesapproaching that of the fastest memories
availableavailable
Cache Memory Principles
-42-
• When the processor attempts to read a word of memory, a check isWhen the processor attempts to read a word of memory, a check is
made to determine if the word is in the cache. If so, the word ismade to determine if the word is in the cache. If so, the word is
delivered to the processordelivered to the processor
• If not, a block of main memory, consisting of some fixed numberIf not, a block of main memory, consisting of some fixed number
of words, is read into the cache and then the word is delivered toof words, is read into the cache and then the word is delivered to
the processorthe processor
• Because of the phenomenon of locality of reference, when a blockBecause of the phenomenon of locality of reference, when a block
of data is fetched into the cache to satisfy a single memoryof data is fetched into the cache to satisfy a single memory
reference, it is likely that there will be future references to thatreference, it is likely that there will be future references to that
same memory location or to other words in the block.same memory location or to other words in the block.
Cache Memory Principles(cont’d)
-43-
Cache Memory principles(cont’d)
-44-
• Main memory consists of up to 2Main memory consists of up to 2nn
addressable words, with eachaddressable words, with each
word having a unique n-bit address.word having a unique n-bit address.
• For mapping purposes, this memory is considered to consist of aFor mapping purposes, this memory is considered to consist of a
number of fixed-length blocks of K words each.number of fixed-length blocks of K words each.
That is, there areThat is, there are M =2M =2nn
/K/K blocks in main memory.blocks in main memory.
• The cache consists of m blocks, calledThe cache consists of m blocks, called lineslines
• NOTE:NOTE: In referring to the basic unit of the cache, the termIn referring to the basic unit of the cache, the term lineline isis
used, rather than the termused, rather than the term blockblock
• Each line containsEach line contains KK words, plus a tag of a few bitswords, plus a tag of a few bits
• Each line also includes control bits (not shown), such as a bit toEach line also includes control bits (not shown), such as a bit to
indicate whether the line has been modified since being loaded intoindicate whether the line has been modified since being loaded into
the cachethe cache
Cache memory principles(cont’d)
-45-
• The number of lines is considerably less than the number of mainThe number of lines is considerably less than the number of main
memory blocks(m<=M)memory blocks(m<=M)
• If a word in a block of memory is read, that block is transferred toIf a word in a block of memory is read, that block is transferred to
one of the lines of the cache.one of the lines of the cache.
• Because there are more blocks than lines, an individual line can-notBecause there are more blocks than lines, an individual line can-not
be uniquely and permanently dedicated to a particular blockbe uniquely and permanently dedicated to a particular block
• Thus, each line includes aThus, each line includes a tagtag that identifies which particular blockthat identifies which particular block
is currently being stored.is currently being stored.
• TheThe tagtag is usually a portion of the main memory addressis usually a portion of the main memory address
Cache memory principles(cont’d)
-46-
Cache Read Operation
-47-
Elements of cache DesignElements of cache Design
-48-
• AA logical cachelogical cache, also known as a virtual cache, stores data using, also known as a virtual cache, stores data using
virtual addressesvirtual addresses
• The processor accesses the cache directly,without going through theThe processor accesses the cache directly,without going through the
MMU(Memory management Unit)MMU(Memory management Unit)
• AdvantageAdvantage of the logical cache is thatof the logical cache is that cache access speed is fastercache access speed is faster
than for a physical cache, because the cache can respond before thethan for a physical cache, because the cache can respond before the
MMU performs an address translationMMU performs an address translation
• DisadvantageDisadvantage? same virtual address in two different applications? same virtual address in two different applications
refers to two different physical addressesrefers to two different physical addresses
Cache Addresses(Logical)
-49-
• A physical cache stores data using main memory physicalA physical cache stores data using main memory physical
addressesaddresses
• The subject of logical versus physical cache is a complex one,The subject of logical versus physical cache is a complex one,
and beyond the scope of this bookand beyond the scope of this book
Cache Addresses(Physical)
-50-
0 1 2 3 4 5 6 70 1 2 3Set Number
Cache
Fully (2-way) Set Direct
Associative Associative Mapped
anywhere anywhere in only into
set 0 block 4
(12 mod 4) (12 mod 8)
0 1 2 3 4 5 6 7 8 9
1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9
2 2 2 2 2 2 2 2 2 2
0 1 2 3 4 5 6 7 8 9
3 3
0 1
Memory
Block Number
block 12
can be placed
Mapping functionsMapping functions
-51-
Main Memory
Direct Mapping Cache Organization
Tag
0
1
2
3
Cache
Indicates block #
Block #0
Block #1
Block #2
Block #3
Block #4
Block #5
Block #6
Block #7
Block #8
Block #9
Block #10
Block #11
Block #12
Block #13
Block #14
Block #15
Direct-Mapping Cache Organization
Destination = (MemBlock Number) modulo
(Total number of CacheBlocks)
(Size = Log2((Total number of
MemBlocks)/(Total number of
CacheBlocks)))
Block #1
Block #2
Block #3
Block #4
Tag Size = log2(16/4)
= log2(4)
0 modulo 4
= 0
-52-
Direct mapping(cont’d)
Compare
Main Memory
Block WordTag
s-r
0
1
2
3
Cache Memory
w
s-r
r
w
s w
s
w
s = Number of wires for row address
w = Number of wires for column address
s-r = log2(Memblocks/CacheBlock)
= Switch
1
1
Miss Hit
(s+w)
-53-
•Address length (s+w) bits
• Number of addressable units =2s+w
words or
bytes
• Block size= line size= 2w
words or bytes
• Number of blocks in main memory (2s+w
)/(2w
)
• Number of lines in cache= m= 2r
• Size of cache 2r+w
words or bytes
• Size of tag (s-r) bit
Direct mapping(cont’d)
-54-
Full-Associative Cache Organization
Tag
0
1
2
3
Cache
Indicates block #
Main Memory
Block #0
Block #1
Block #2
Block #3
Block #4
Block #5
Block #6
Block #7
Block #8
Block #9
Block #10
Block #11
Block #12
Block #13
Block #14
Block #15
Full-Associate Cache Organization
(Size = Log2(Memory Block#))
A memory block can go
to any cache block
-55-
Full Associative Cache Organization(cont’d)
Compare
WordTag
Main Memory
0
1
2
3
Cache Memory
w
s
s+w
s
Log2(CacheSize)
s
w
s = log2(# of MemoryBlocks)
1
1
Miss
Hit
w
-56-
Set Associative Cache Organization
Set-Associative Cache Organization
0
1
0
1
0
1
0
1
0
1
2
3
“Set Number”
(two-way set associative)
Block #0
Block #1
Block #2
Block #3
Block #4
Block #5
Block #6
Block #7
Block #8
Block #9
Block #10
Block #11
Block #12
Block #13
Block #14
Block #15
Main Memory
Block #0
Block #0
Block #0
Block #0
Block #1
Block #1
Block #1
Block #1
Destination = (Memory Block) modulo (# of sets)
-57-
Set Associative Cache Organization
Compare
Main Memory
w
s-d
WordTag Set
0
2
3
Cache Memory
1
d
s-d
s-d = ???
s
w
s+w
s w
1
1
Miss
Hit
w
-58-
Replacement Policy
•In an associative cache, which block from a set should be
evicted when the set becomes full?
• Random
• Least-Recently Used (LRU)
• LRU cache state must be updated on every access
• true implementation only feasible for small sets (2-way)
• pseudo-LRU binary tree often used for 4-8 way
• First-In, First-Out (FIFO)
• used in highly associative caches
• Not-Most-Recently Used (NMRU)
• FIFO with exception for most-recently used block or blocks
This is a second-order effect. Why? Because Replacement only
happens on misses
-59-
Seconday Memory
-60-
THE MEMORY HIERARCHY(revisitedTHE MEMORY HIERARCHY(revisited)
Memory in a computer system is arranged in aMemory in a computer system is arranged in a
hierarchy, as shown in Figurehierarchy, as shown in Figure
-61-
• At the top,we have primary storage, whichAt the top,we have primary storage, which
consists of cache and main memory, andconsists of cache and main memory, and
provides very fast access to data.provides very fast access to data.
• Then comes secondary storage, which consistsThen comes secondary storage, which consists
of slower devices such as magnetic disks.of slower devices such as magnetic disks.
• Tertiary storage is the slowest class of storageTertiary storage is the slowest class of storage
devices;devices;
• for example, optical disks and tapesfor example, optical disks and tapes
-62-
• Slower storage devices such as tapes and disksSlower storage devices such as tapes and disks
play an important role in database systemsplay an important role in database systems
because the amount of data is typically verybecause the amount of data is typically very
largelarge
• Since buying enough main memory to store allSince buying enough main memory to store all
data is prohibitively expensive, we must storedata is prohibitively expensive, we must store
data on tapes and disks and build databasedata on tapes and disks and build database
systems that can retrieve data from lowersystems that can retrieve data from lower
levels of the memory hierarchy into mainlevels of the memory hierarchy into main
memory as needed for processing.memory as needed for processing.
-63-
Why storing data on secondary memory?Why storing data on secondary memory?
COSTCOST
• The cost of a given amount of main memory is about 100The cost of a given amount of main memory is about 100
times the cost of the same amount of disk space, and tapestimes the cost of the same amount of disk space, and tapes
are even less expensive than disks.are even less expensive than disks.
PRIMARY STORAGE IS USUALLY VOLATILEPRIMARY STORAGE IS USUALLY VOLATILE
• Main memory size is very small but the number of dataMain memory size is very small but the number of data
objects may exceed this number!objects may exceed this number!
Further, data must be maintained across program executions.Further, data must be maintained across program executions.
This requires storage devices that retain information whenThis requires storage devices that retain information when
the computer is restarted (after a shutdown or a crash); wethe computer is restarted (after a shutdown or a crash); we
call such storage nonvolatilecall such storage nonvolatile
-64-
Primary storage is usually volatile (although itPrimary storage is usually volatile (although it
is possible to make it nonvolatile by adding ais possible to make it nonvolatile by adding a
battery backup feature), whereas secondarybattery backup feature), whereas secondary
and tertiary storage is nonvolatile.and tertiary storage is nonvolatile.
Tapes are relatively inexpensive and can storeTapes are relatively inexpensive and can store
very large amounts of data.very large amounts of data.
 They are a good choice for archival storage,They are a good choice for archival storage,
i.e, when we need to maintain data for a longi.e, when we need to maintain data for a long
period but do not expect to access it veryperiod but do not expect to access it very
often.often.
-65-
DRAWBACKS OF TAPESDRAWBACKS OF TAPES
 they are sequential access devicesthey are sequential access devices
We must essentially step through all the data inWe must essentially step through all the data in
order and cannot directly access a given locationorder and cannot directly access a given location
on tapeon tape
 This makes tapes unsuitable for storing operationalThis makes tapes unsuitable for storing operational
data , or data that is frequently accessed.data , or data that is frequently accessed.
 Tapes are mostly used to back up operational dataTapes are mostly used to back up operational data
periodicallyperiodically..
-66-
Magnetic disksMagnetic disks
• support direct access to a desired locationsupport direct access to a desired location
• widely used for database applicationswidely used for database applications
• A DBMS provides seamless access to data on disk;A DBMS provides seamless access to data on disk;
• applications need not worry about whether data isapplications need not worry about whether data is
in main memory or diskin main memory or disk
-67-
Structure of a DiskStructure of a Disk
-68-
Disks(cont’d)Disks(cont’d)
Data is stored on disk in units called disk blocksData is stored on disk in units called disk blocks
A disk block is a contiguous sequence of bytes and isA disk block is a contiguous sequence of bytes and is
the unit in which data is written to a disk and readthe unit in which data is written to a disk and read
from a diskfrom a disk
Blocks are arranged in concentric rings called tracks,Blocks are arranged in concentric rings called tracks,
on one or more platterson one or more platters
Tracks can be recorded on one or both surfaces of aTracks can be recorded on one or both surfaces of a
platter; we refer to platters as single-sided or double-platter; we refer to platters as single-sided or double-
sided accordinglysided accordingly
-69-
Disks(cont’d)Disks(cont’d)
• The set of all tracks with the same diameter is called aThe set of all tracks with the same diameter is called a
cylindercylinder, because the space occupied by these tracks is, because the space occupied by these tracks is
shaped like a cylinder;shaped like a cylinder;
• Each track is divided into arcs calledEach track is divided into arcs called sectorssectors, whose size is, whose size is
a characteristic of the disk and cannot be changeda characteristic of the disk and cannot be changed
• An array of disk heads, one per recorded surface, is movedAn array of disk heads, one per recorded surface, is moved
as a unit; when one head is positioned over a block, theas a unit; when one head is positioned over a block, the
other heads are in identical positions with respect to theirother heads are in identical positions with respect to their
plattersplatters
-70-
Disks(cont’d)Disks(cont’d)
• To read or write a block, a disk head must be positioned onTo read or write a block, a disk head must be positioned on
top of the blocktop of the block
• As the size of a platter decreases, seek times also decreaseAs the size of a platter decreases, seek times also decrease
since we have to move a disk head a smaller distance.since we have to move a disk head a smaller distance.
• Typical platter diameters are 3.5 inches and 5.25 inchesTypical platter diameters are 3.5 inches and 5.25 inches
• A disk controller interfaces a disk drive to the computer. ItA disk controller interfaces a disk drive to the computer. It
implements commands to read or write a sector by movingimplements commands to read or write a sector by moving
the arm assembly and transferring data to and from the diskthe arm assembly and transferring data to and from the disk
surfacessurfaces
-71-
Access time to a diskAccess time to a disk
• The time to access a disk block has several componentsThe time to access a disk block has several components..
SeekSeek timetime is the time taken to move the disk heads to theis the time taken to move the disk heads to the
track on which a desired block is located.track on which a desired block is located.
Rotational delayRotational delay is the waiting time for the desiredis the waiting time for the desired
block to rotate under the disk head;block to rotate under the disk head;
it is the time required for half a rotation on average and isit is the time required for half a rotation on average and is
usually less than seek timeusually less than seek time
-72-
Access time to a disk(cont’d)Access time to a disk(cont’d)
Transfer timeTransfer time is the time to actually read or write the data inis the time to actually read or write the data in
the block once the head is positioned, that is, the timethe block once the head is positioned, that is, the time
for the disk to rotate over the block.for the disk to rotate over the block.
Example The IBM Deskstar 14GPXExample The IBM Deskstar 14GPX
- it is a 3.5 inch diameter,- it is a 3.5 inch diameter,
-14.4 GB hard disk,-14.4 GB hard disk,
-average seek time of 9.1milliseconds (msec)-average seek time of 9.1milliseconds (msec)
- verage rotational delay of 4.17 msec.- verage rotational delay of 4.17 msec.
-seek time from one track to the next is just 2.2 ms-seek time from one track to the next is just 2.2 ms
-73-
Performance Implications of Disk StructurePerformance Implications of Disk Structure
1.1. The unit for data transfer between disk and mainThe unit for data transfer between disk and main
memory is a block;memory is a block;
if a single item on a block is needed, the entire block isif a single item on a block is needed, the entire block is
transferred.transferred.
Reading or writing a disk block is called an I/O (forReading or writing a disk block is called an I/O (for
input/output) operation.input/output) operation.
2. The time to read or write a block varies, depending on2. The time to read or write a block varies, depending on
the location of the datathe location of the data
access time = seek time + rotational delay +access time = seek time + rotational delay +
transfer timetransfer time-74-
Virtual MemoryVirtual Memory
Use main memory as a “cache” for secondary (disk)Use main memory as a “cache” for secondary (disk)
storagestorage
 Managed jointly by CPU hardware and the operatingManaged jointly by CPU hardware and the operating
system (OS)system (OS)
Programs share main memoryPrograms share main memory
 Each gets a private virtual address space holding itsEach gets a private virtual address space holding its
frequently used code and datafrequently used code and data
 Protected from other programsProtected from other programs
CPU and OS translate virtual addresses to physicalCPU and OS translate virtual addresses to physical
addressesaddresses
 VM “block” is called a pageVM “block” is called a page
 VM translation “miss” is called a page faultVM translation “miss” is called a page fault
-75-
Address TranslationAddress Translation
Fixed-size pages (e.g., 4K)Fixed-size pages (e.g., 4K)
-76-
Page Fault PenaltyPage Fault Penalty
On page fault, the page must be fetched from diskOn page fault, the page must be fetched from disk
 Takes millions of clock cyclesTakes millions of clock cycles
 Handled by OS codeHandled by OS code
Try to minimize page fault rateTry to minimize page fault rate
 Fully associative placementFully associative placement
 Smart replacement algorithmsSmart replacement algorithms
-77-
Page TablesPage Tables
Stores placement informationStores placement information
 Array of page table entries, indexed by virtual pageArray of page table entries, indexed by virtual page
numbernumber
 Page table register in CPU points to page table inPage table register in CPU points to page table in
physical memoryphysical memory
If page is present in memoryIf page is present in memory
 Page Table Entry stores the physical page numberPage Table Entry stores the physical page number
 Plus other status bits (referenced, dirty, …)Plus other status bits (referenced, dirty, …)
If page is not presentIf page is not present
 Page Table Entry can refer to location in swap spacePage Table Entry can refer to location in swap space
on diskon disk
-78-
Translation Using a Page TableTranslation Using a Page Table
-79-
Mapping Pages to StorageMapping Pages to Storage
-80-
Replacement and WritesReplacement and Writes
To reduce page fault rate, prefer least-recently usedTo reduce page fault rate, prefer least-recently used
(LRU) replacement(LRU) replacement
 Reference bit (aka use bit) in PTE set to 1 on access toReference bit (aka use bit) in PTE set to 1 on access to
pagepage
 Periodically cleared to 0 by OSPeriodically cleared to 0 by OS
 A page with reference bit = 0 has not been used recentlyA page with reference bit = 0 has not been used recently
Disk writes take millions of cyclesDisk writes take millions of cycles
 Block at once, not individual locationsBlock at once, not individual locations
 Write through is impracticalWrite through is impractical
 Use write-backUse write-back
 Dirty bit in PTE set when page is writtenDirty bit in PTE set when page is written
-81-
Fast Translation Using a TLBFast Translation Using a TLB
Address translation would appear to requireAddress translation would appear to require
extra memory referencesextra memory references
 One to access the PTEOne to access the PTE
 Then the actual memory accessThen the actual memory access
But access to page tables has good localityBut access to page tables has good locality
 So use a fast cache of PTEs within the CPU Called aSo use a fast cache of PTEs within the CPU Called a
Translation Look-aside Buffer (TLB)Translation Look-aside Buffer (TLB)
 Typical: 16–512 PTEs, 0.5–1 cycle for hit, 10–100 cyclesTypical: 16–512 PTEs, 0.5–1 cycle for hit, 10–100 cycles
for miss, 0.01%–1% miss ratefor miss, 0.01%–1% miss rate
 Misses could be handled by hardware or softwareMisses could be handled by hardware or software
-82-
Fast Translation Using a TLBFast Translation Using a TLB
-83-
TLB MissesTLB Misses
If page is in memoryIf page is in memory
 Load the PTE from memory and retryLoad the PTE from memory and retry
 Could be handled in hardwareCould be handled in hardware
 Can get complex for more complicated page tableCan get complex for more complicated page table
structuresstructures
 Or in softwareOr in software
 Raise a special exception, with optimized handlerRaise a special exception, with optimized handler
If page is not in memory (page fault)If page is not in memory (page fault)
 OS handles fetching the page and updating the pageOS handles fetching the page and updating the page
tabletable
 Then restart the faulting instructionThen restart the faulting instruction
-85-
TLB Miss HandlerTLB Miss Handler
TLB miss indicatesTLB miss indicates
 Page present, but PTE not in TLBPage present, but PTE not in TLB
 Page not presetPage not preset
Must recognize TLB miss before destinationMust recognize TLB miss before destination
register overwrittenregister overwritten
 Raise exceptionRaise exception
Handler copies PTE from memory to TLBHandler copies PTE from memory to TLB
 Then restarts instructionThen restarts instruction
 If page not present, page fault will occurIf page not present, page fault will occur
-85-
Page Fault HandlerPage Fault Handler
Use faulting virtual address to find PTEUse faulting virtual address to find PTE
Locate page on diskLocate page on disk
Choose page to replaceChoose page to replace
 If dirty, write to disk first
Read page into memory and update page tableRead page into memory and update page table
Make process runnable againMake process runnable again
 Restart from faulting instruction
-86-
TLB and Cache Interaction
If cache tag usesIf cache tag uses
physical addressphysical address
 Need to translateNeed to translate
before cachebefore cache
lookuplookup
Alternative: useAlternative: use
virtual address tagvirtual address tag
 ComplicationsComplications
due to aliasingdue to aliasing
 Different virtualDifferent virtual
addresses foraddresses for
shared physicalshared physical
addressaddress
-87-

Contenu connexe

Tendances

Design and Simulation Low power SRAM Circuits
Design and Simulation Low power SRAM CircuitsDesign and Simulation Low power SRAM Circuits
Design and Simulation Low power SRAM Circuitsijsrd.com
 
250nm Technology Based Low Power SRAM Memory
250nm Technology Based Low Power SRAM Memory250nm Technology Based Low Power SRAM Memory
250nm Technology Based Low Power SRAM Memoryiosrjce
 
SRAM read and write and sense amplifier
SRAM read and write and sense amplifierSRAM read and write and sense amplifier
SRAM read and write and sense amplifierSoumyajit Langal
 
DLC logic families and memory
DLC logic families and memoryDLC logic families and memory
DLC logic families and memorybalu prithviraj
 
Static-Noise-Margin Analysis of Modified 6T SRAM Cell during Read Operation
Static-Noise-Margin Analysis of Modified 6T SRAM Cell during Read OperationStatic-Noise-Margin Analysis of Modified 6T SRAM Cell during Read Operation
Static-Noise-Margin Analysis of Modified 6T SRAM Cell during Read Operationidescitation
 
Data remanence in semiconductor devices
Data remanence in semiconductor devicesData remanence in semiconductor devices
Data remanence in semiconductor devicessergiococo1
 
Time and Low Power Operation Using Embedded Dram to Gain Cell Data Retention
Time and Low Power Operation Using Embedded Dram to Gain Cell Data RetentionTime and Low Power Operation Using Embedded Dram to Gain Cell Data Retention
Time and Low Power Operation Using Embedded Dram to Gain Cell Data RetentionIJMTST Journal
 
Bt0068 computer organization and architecture
Bt0068 computer organization and architecture Bt0068 computer organization and architecture
Bt0068 computer organization and architecture Techglyphs
 
Chapter 2 programming concepts - I
Chapter 2  programming concepts - IChapter 2  programming concepts - I
Chapter 2 programming concepts - ISHREEHARI WADAWADAGI
 
Memory Access Instructions
Memory Access InstructionsMemory Access Instructions
Memory Access InstructionsSharif Ullah
 
8086 class notes-Y.N.M
8086 class notes-Y.N.M8086 class notes-Y.N.M
8086 class notes-Y.N.MDr.YNM
 

Tendances (20)

Design and Simulation Low power SRAM Circuits
Design and Simulation Low power SRAM CircuitsDesign and Simulation Low power SRAM Circuits
Design and Simulation Low power SRAM Circuits
 
250nm Technology Based Low Power SRAM Memory
250nm Technology Based Low Power SRAM Memory250nm Technology Based Low Power SRAM Memory
250nm Technology Based Low Power SRAM Memory
 
SRAM read and write and sense amplifier
SRAM read and write and sense amplifierSRAM read and write and sense amplifier
SRAM read and write and sense amplifier
 
01buses ver2_
01buses  ver2_01buses  ver2_
01buses ver2_
 
Isa
IsaIsa
Isa
 
DLC logic families and memory
DLC logic families and memoryDLC logic families and memory
DLC logic families and memory
 
Static-Noise-Margin Analysis of Modified 6T SRAM Cell during Read Operation
Static-Noise-Margin Analysis of Modified 6T SRAM Cell during Read OperationStatic-Noise-Margin Analysis of Modified 6T SRAM Cell during Read Operation
Static-Noise-Margin Analysis of Modified 6T SRAM Cell during Read Operation
 
Synchronous optical networking (SONET)
Synchronous optical networking (SONET) Synchronous optical networking (SONET)
Synchronous optical networking (SONET)
 
Kj2417641769
Kj2417641769Kj2417641769
Kj2417641769
 
Segment registers
Segment registersSegment registers
Segment registers
 
Data remanence in semiconductor devices
Data remanence in semiconductor devicesData remanence in semiconductor devices
Data remanence in semiconductor devices
 
Intel 80486 Microprocessor
Intel 80486 MicroprocessorIntel 80486 Microprocessor
Intel 80486 Microprocessor
 
Registers
RegistersRegisters
Registers
 
Time and Low Power Operation Using Embedded Dram to Gain Cell Data Retention
Time and Low Power Operation Using Embedded Dram to Gain Cell Data RetentionTime and Low Power Operation Using Embedded Dram to Gain Cell Data Retention
Time and Low Power Operation Using Embedded Dram to Gain Cell Data Retention
 
Highridge ISA
Highridge ISAHighridge ISA
Highridge ISA
 
Bt0068 computer organization and architecture
Bt0068 computer organization and architecture Bt0068 computer organization and architecture
Bt0068 computer organization and architecture
 
Chapter 2 programming concepts - I
Chapter 2  programming concepts - IChapter 2  programming concepts - I
Chapter 2 programming concepts - I
 
L24093097
L24093097L24093097
L24093097
 
Memory Access Instructions
Memory Access InstructionsMemory Access Instructions
Memory Access Instructions
 
8086 class notes-Y.N.M
8086 class notes-Y.N.M8086 class notes-Y.N.M
8086 class notes-Y.N.M
 

Similaire à Memory systems n

Computer Organisation and Architecture
Computer Organisation and ArchitectureComputer Organisation and Architecture
Computer Organisation and ArchitectureSubhasis Dash
 
sramanddram.ppt
sramanddram.pptsramanddram.ppt
sramanddram.pptAmalNath44
 
Computer organization memory
Computer organization memoryComputer organization memory
Computer organization memoryDeepak John
 
memory system notes.pptx
memory system notes.pptxmemory system notes.pptx
memory system notes.pptx1DA21CS173
 
memeoryorganization PPT for organization of memories
memeoryorganization PPT for organization of memoriesmemeoryorganization PPT for organization of memories
memeoryorganization PPT for organization of memoriesGauravDaware2
 
901320_Main Memory.ppt
901320_Main Memory.ppt901320_Main Memory.ppt
901320_Main Memory.pptChandinialla1
 
Unit- 4 Computer Oganization and Architecture
Unit- 4 Computer Oganization and ArchitectureUnit- 4 Computer Oganization and Architecture
Unit- 4 Computer Oganization and Architecturethangamsuresh3
 
Memory types and logic families
Memory types and logic familiesMemory types and logic families
Memory types and logic familiesJaipal Dhobale
 
Unit 4 memory system
Unit 4   memory systemUnit 4   memory system
Unit 4 memory systemchidabdu
 
Digital logic-formula-notes-final-1
Digital logic-formula-notes-final-1Digital logic-formula-notes-final-1
Digital logic-formula-notes-final-1Kshitij Singh
 
EC8392 -DIGITAL ELECTRONICS -II YEAR ECE-by S.SESHA VIDHYA /ASP/ ECE/ RMKCET
EC8392 -DIGITAL ELECTRONICS -II YEAR ECE-by S.SESHA VIDHYA /ASP/ ECE/ RMKCETEC8392 -DIGITAL ELECTRONICS -II YEAR ECE-by S.SESHA VIDHYA /ASP/ ECE/ RMKCET
EC8392 -DIGITAL ELECTRONICS -II YEAR ECE-by S.SESHA VIDHYA /ASP/ ECE/ RMKCETSeshaVidhyaS
 
Lec17 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Me...
Lec17 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Me...Lec17 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Me...
Lec17 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Me...Hsien-Hsin Sean Lee, Ph.D.
 

Similaire à Memory systems n (20)

Computer Organisation and Architecture
Computer Organisation and ArchitectureComputer Organisation and Architecture
Computer Organisation and Architecture
 
sramanddram.ppt
sramanddram.pptsramanddram.ppt
sramanddram.ppt
 
Computer organization memory
Computer organization memoryComputer organization memory
Computer organization memory
 
memory system notes.pptx
memory system notes.pptxmemory system notes.pptx
memory system notes.pptx
 
memeoryorganization PPT for organization of memories
memeoryorganization PPT for organization of memoriesmemeoryorganization PPT for organization of memories
memeoryorganization PPT for organization of memories
 
901320_Main Memory.ppt
901320_Main Memory.ppt901320_Main Memory.ppt
901320_Main Memory.ppt
 
Unit- 4 Computer Oganization and Architecture
Unit- 4 Computer Oganization and ArchitectureUnit- 4 Computer Oganization and Architecture
Unit- 4 Computer Oganization and Architecture
 
Module4
Module4Module4
Module4
 
Memory types and logic families
Memory types and logic familiesMemory types and logic families
Memory types and logic families
 
64 bit sram memory: design paper
64 bit sram memory: design paper64 bit sram memory: design paper
64 bit sram memory: design paper
 
Unit 4 memory system
Unit 4   memory systemUnit 4   memory system
Unit 4 memory system
 
Ch13.pdf
Ch13.pdfCh13.pdf
Ch13.pdf
 
ram.pdf
ram.pdfram.pdf
ram.pdf
 
Digital logic-formula-notes-final-1
Digital logic-formula-notes-final-1Digital logic-formula-notes-final-1
Digital logic-formula-notes-final-1
 
EC8392 -DIGITAL ELECTRONICS -II YEAR ECE-by S.SESHA VIDHYA /ASP/ ECE/ RMKCET
EC8392 -DIGITAL ELECTRONICS -II YEAR ECE-by S.SESHA VIDHYA /ASP/ ECE/ RMKCETEC8392 -DIGITAL ELECTRONICS -II YEAR ECE-by S.SESHA VIDHYA /ASP/ ECE/ RMKCET
EC8392 -DIGITAL ELECTRONICS -II YEAR ECE-by S.SESHA VIDHYA /ASP/ ECE/ RMKCET
 
Sram
SramSram
Sram
 
Sram
SramSram
Sram
 
SRAM
SRAMSRAM
SRAM
 
Sram
SramSram
Sram
 
Lec17 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Me...
Lec17 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Me...Lec17 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Me...
Lec17 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Me...
 

Dernier

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Dernier (20)

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

Memory systems n

  • 1. The Memory Hierarchy TopicsTopics  Storage technologies and trends  Locality of reference  Caching in the memory hierarchy Memory Systems -1-
  • 2. Main Memory • Main memory provides the main storage for a computer.Main memory provides the main storage for a computer. • Two CPU registers are used to interface the CPU to theTwo CPU registers are used to interface the CPU to the main memory. These aremain memory. These are the memory address register (MA R) andthe memory address register (MA R) and the memory data register (MDR).the memory data register (MDR). • The MDR is used to hold the data to be stored and / orThe MDR is used to hold the data to be stored and / or retrieved in / from the memory location whose address isretrieved in / from the memory location whose address is held in the MARheld in the MAR -2-
  • 4. conceptual internal organization of a memory chip -4-
  • 5. Internal Organization of Memory Chips FF Figure Organization of bit cells in a memory chip. circui t Sense / Write Address decoder FF CS cells Memory circui t Sense / Write Sense / Write circuit Data input/output lines: A0 A1 A2 A3 W0 W1 W15 b7 b1 b0 WR / b′7 b′1 b′0 b7 b1 b0 • • • • • • • • • • • • • • • • • • • • • • • • • • • -5-
  • 6. size • 16 words of 8 bits each: 16x8 memory org.. It has16 words of 8 bits each: 16x8 memory org.. It has 16 external connections: addr. 4, data 8, control: 2,16 external connections: addr. 4, data 8, control: 2, power/ground: 2power/ground: 2 • 1K memory cells: 128x8 memory, external1K memory cells: 128x8 memory, external connections: ? 19(7+8+2+2)connections: ? 19(7+8+2+2) • 1Kx1:? 15 (10+1+2+2)1Kx1:? 15 (10+1+2+2) -6-
  • 7. Main memory(cont’d) • Visualize a typical internal main memory structure asVisualize a typical internal main memory structure as consisting of rows and columns of basic cells.consisting of rows and columns of basic cells. • Each cell storing one bit of information.Each cell storing one bit of information. • Cells belonging to a given row can be assumed to form theCells belonging to a given row can be assumed to form the bits of a givenbits of a given memory word.memory word. • Address lines AAddress lines An-1n-1 AAn-n-22 ... A... A11 AA00 are used as inputs to theare used as inputs to the address decoder in order to generate the word select linesaddress decoder in order to generate the word select lines WW22 n-1n-1 ... W... W11 WW00 -7-
  • 8. Main memory cont’d • A given word select line is common to all memory cells in the sameA given word select line is common to all memory cells in the same row.row. • At any given time ,the address decoder activates only one word selectAt any given time ,the address decoder activates only one word select line while deactivating the remaining lines.line while deactivating the remaining lines. • A word select line is used to enable all cells in a row for read or write.A word select line is used to enable all cells in a row for read or write. • Data (bit) lines are used to input or output the content s of cells.Data (bit) lines are used to input or output the content s of cells. • Each memory cell is connected to two data lines. A given data line isEach memory cell is connected to two data lines. A given data line is common to all cells in a given column.common to all cells in a given column. -8-
  • 9. What are these “Cells”? • They are just circuits capable of storing 1 bit of information.They are just circuits capable of storing 1 bit of information. • In static CMOS technology, each main memory cell consists ofIn static CMOS technology, each main memory cell consists of two inverters connected back to back(six transistors)two inverters connected back to back(six transistors) -9-
  • 10. How the above circuit works? • If A =1, then transistor NIf A =1, then transistor N22 will be ON and point B=0, which in turnwill be ON and point B=0, which in turn will cause transistor Pwill cause transistor P11 to be ON , thus causing point A=1. Thisto be ON , thus causing point A=1. This represents a cell stable state, call it state 1represents a cell stable state, call it state 1 • If in the figure A =0, then transistor N1 will be ON and point B=1,If in the figure A =0, then transistor N1 will be ON and point B=1, which in turn will cause transistor Pwhich in turn will cause transistor P22 to be ON , thus causing pointto be ON , thus causing point B=1. This represents a cell stable state, state 2B=1. This represents a cell stable state, state 2 • Transistors NTransistors N33 and Nand N44 connect the cell to the two data (bit) lines.connect the cell to the two data (bit) lines. • Normally (if the word select is not activated) transistors are turnedNormally (if the word select is not activated) transistors are turned off, thus protecting the cell from the signal values carried by the dataoff, thus protecting the cell from the signal values carried by the data lines.lines. • The two transistors(NThe two transistors(N33 and Nand N44) are turned on when the word select line) are turned on when the word select line is activated.is activated. • What takes place when the two transistors are turned on will dependWhat takes place when the two transistors are turned on will depend on the intended memory operationon the intended memory operation -10-
  • 11. Read Operation 1.Both lines b and b are pre-charged high.1.Both lines b and b are pre-charged high. 2. The word select line is activated, thus turning on both2. The word select line is activated, thus turning on both transistors N3 and N 4 .transistors N3 and N 4 . 3. Depending on the internal value stored in the cell,3. Depending on the internal value stored in the cell, point A( B) will lead to the discharge of line b ( bb’ ).point A( B) will lead to the discharge of line b ( bb’ ). -11-
  • 12. Write operation: 1. The bit lines are pre-charged such that b( b’ ) = 1(0)1. The bit lines are pre-charged such that b( b’ ) = 1(0) 2. The word select line is activated, thus turning on both2. The word select line is activated, thus turning on both transistors N3 and N 4 .transistors N3 and N 4 . 3. The bit line pre-charged with 0 will have to force the3. The bit line pre-charged with 0 will have to force the point A (B ), which has1, to 0.point A (B ), which has1, to 0. -12-
  • 13. Main draw back of this organization • Consider, a 1K X 4 memory chipConsider, a 1K X 4 memory chip • Using the organization shown above the memory arrayUsing the organization shown above the memory array should be organized as 1K rows of cells, each consisting ofshould be organized as 1K rows of cells, each consisting of 4 cells4 cells • The chip will then have to have 10 pins for the address andThe chip will then have to have 10 pins for the address and 4 pins for the data.4 pins for the data. • However, this doesn’t not lead to the best utilization of theHowever, this doesn’t not lead to the best utilization of the chip areachip area -13-
  • 15. Impact of using different organization • Consider, for example, the design of a memory subsystem whoseConsider, for example, the design of a memory subsystem whose capacity is 4K bits.capacity is 4K bits. • Different organization of the same memory capacity can lead to aDifferent organization of the same memory capacity can lead to a different number of chip pins requirementdifferent number of chip pins requirement -15-
  • 16. Example • Consider, for example, the design of aConsider, for example, the design of a 4M bytes4M bytes main memorymain memory subsystem usingsubsystem using 1M bit1M bit chipchip • The number of required chips is 32 chipsThe number of required chips is 32 chips • Note that the number of address lines required for the 4M system isNote that the number of address lines required for the 4M system is 22, while the number of data lines is 8.22, while the number of data lines is 8. • The memory subsystem can be arranged in four rows, each havingThe memory subsystem can be arranged in four rows, each having eight chips.eight chips. • The least significant 20 address lines AThe least significant 20 address lines A1919 toto AA00 are used to address anyare used to address any of the basic building block 1M single bit chips.of the basic building block 1M single bit chips. • The high- order two address lines AThe high- order two address lines A2121–A–A2020 are used as inputs to a 2 – 4are used as inputs to a 2 – 4 decoderdecoder -16-
  • 19. Static Memories The circuits are capable of retaining their state as long as power isThe circuits are capable of retaining their state as long as power is appliedapplied YX Word line Bit lines Figure   A static RAM cell. b T2T1 b′ -19-
  • 20. Static Memories CMOS cell: lCMOS cell: low power consumptionow power consumption -20-
  • 21. Asynchronous DRAMs • Static RAMs are fast, but they cost more area and are moreStatic RAMs are fast, but they cost more area and are more expensive.expensive. • Dynamic RAMs (DRAMs) are cheap and area efficient, but they canDynamic RAMs (DRAMs) are cheap and area efficient, but they can not retain their state indefinitely – need to be periodically refreshed.not retain their state indefinitely – need to be periodically refreshed. Figure  A single­transistor dynamic memory cell  T C Word line Bit line -21-
  • 22. Random-Access Memory (RAM) Key featuresKey features • RAM is packaged as a chipRAM is packaged as a chip • Basic storage unit is a cell (one bit per cell)Basic storage unit is a cell (one bit per cell) • Multiple RAM chips form a memoryMultiple RAM chips form a memory Static RAM (Static RAM (SRAMSRAM)) • Each cell stores bit with a six-transistor circuitEach cell stores bit with a six-transistor circuit • Retains value indefinitely, as long as it is kept poweredRetains value indefinitely, as long as it is kept powered • Relatively insensitive to disturbances such as electrical noiseRelatively insensitive to disturbances such as electrical noise • Faster and more expensive than DRAMFaster and more expensive than DRAM Dynamic RAM (Dynamic RAM (DRAMDRAM)) • Each cell stores bit with a capacitor and transistorEach cell stores bit with a capacitor and transistor • Value must be refreshed every 10-100 msValue must be refreshed every 10-100 ms • Sensitive to disturbancesSensitive to disturbances • Slower and cheaper than SRAMSlower and cheaper than SRAM -22-
  • 23. Non-Volatile RAM (NVRAM) Key Feature: Keeps data when powerKey Feature: Keeps data when power lostlost  Several types  Reading assignment(read from your txtbook and from the web) -23-
  • 24. AA busbus is a collection of parallel wires that carryis a collection of parallel wires that carry address, data, and control signalsaddress, data, and control signals Buses are typically shared by multiple devicesBuses are typically shared by multiple devices main memory I/O bridge bus interface ALU register file CPU chip system bus memory bus Typical Bus Structure Connecting CPU and Memory -28-
  • 25. CPU places address A on memory busCPU places address A on memory bus ALU register file bus interface A 0 Ax main memory I/O bridge %eax Load operation: movl A, %eax Memory Read Transaction (1) -29-
  • 26. Main memory reads A from memory bus, retrievesMain memory reads A from memory bus, retrieves word x, and places it on busword x, and places it on bus ALU register file bus interface x 0 Ax main memory %eax I/O bridge Load operation: movl A, %eax Memory Read Transaction (2) -30-
  • 27. CPU reads word x from bus and copies it intoCPU reads word x from bus and copies it into register %eaxregister %eax x ALU register file bus interface x main memory 0 A %eax I/O bridge Load operation: movl A, %eax Memory Read Transaction (3) -31-
  • 28. CPU places address A on bus; main memoryCPU places address A on bus; main memory reads it and waits for corresponding data wordreads it and waits for corresponding data word to arriveto arrive y ALU register file bus interface A main memory 0 A %eax I/O bridge Store operation: movl %eax, A Memory Write Transaction (1) -32-
  • 29. CPU places data word y on busCPU places data word y on bus y ALU register file bus interface y main memory 0 A %eax I/O bridge Store operation: movl %eax, A Memory Write Transaction (2) -33-
  • 30. Main memory reads data word y from bus andMain memory reads data word y from bus and stores it at address Astores it at address A y ALU register file bus interface y main memory 0 A %eax I/O bridge Store operation: movl %eax, A Memory Write Transaction (3) -34-
  • 31. Principle of Locality:Principle of Locality:  Programs tend to reuse data and instructions near those they have used recently, or that were recently referenced themselves  Temporal locality: Recently referenced items are likely to be referenced in the near future  Spatial locality: Items with nearby addresses tend to be referenced close together in time Locality Example: • Data – Reference array elements in succession (stride-1 reference pattern): – Reference sum each iteration: • Instructions – Reference instructions in sequence: – Cycle through loop repeatedly: sum = 0; for (i = 0; i < n; i++) sum += a[i]; return sum; Spatial locality Spatial locality Temporal locality Temporal locality Locality -35-
  • 32. Some fundamental and enduring properties ofSome fundamental and enduring properties of hardware and software:hardware and software:  Fast storage technologies cost more per byte and have less capacity  Gap between CPU and main memory speed is widening  Well-written programs tend to exhibit good locality These fundamental properties complement eachThese fundamental properties complement each other beautifullyother beautifully They suggest an approach for organizing memoryThey suggest an approach for organizing memory and storage systems known as aand storage systems known as a memorymemory hierarchyhierarchy Memory Hierarchy -36-
  • 33. registers on-chip L1 cache (SRAM) main memory (DRAM) local secondary storage (local disks) Larger, slower, and cheaper (per byte) storage devices remote secondary storage (distributed file systems, Web servers) Local disks hold files retrieved from disks on remote network servers Main memory holds disk blocks retrieved from local disks off-chip L2 cache (SRAM) L1 cache holds cache lines retrieved from the L2 cache memory CPU registers hold words retrieved from L1 cache L2 cache holds cache lines retrieved from main memory L0: L1: L2: L3: L4: L5: Smaller, faster, and costlier (per byte) storage devices An Example Memory Hierarchy -37-
  • 34. Cache:Cache: Smaller, faster storage device that acts asSmaller, faster storage device that acts as staging area for subset of data in a larger,staging area for subset of data in a larger, slower deviceslower device Fundamental idea of a memory hierarchy:Fundamental idea of a memory hierarchy:  For each k, the faster, smaller device at level k serves as cache for larger, slower device at level k+1 Why do memory hierarchies work?Why do memory hierarchies work?  Programs tend to access data at level k more often than they access data at level k+1  Thus, storage at level k+1 can be slower, and thus larger and cheaper per bit  Net effect: Large pool of memory that costs as little as the cheap storage near the bottom, but that serves data to programs at rate of the fast storage near the≈ top Caches -38-
  • 35. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Larger, slower, cheaper storage device at level k+1 is partitioned into blocks. Data is copied between levels in block-sized transfer units 8 9 14 3 Smaller, faster, more expensive device at level k caches a subset of the blocks from level k+1 Level k: Level k+1: 4 4 4 10 10 10 Caching in a Memory Hierarchy -39-
  • 36. Request 14 Request 12 Program needs object d, which is storedProgram needs object d, which is stored in some block bin some block b Cache hitCache hit  Program finds b in the cache atProgram finds b in the cache at level k. E.g., block 14level k. E.g., block 14 Cache missCache miss  b is not at level k, so level k cacheb is not at level k, so level k cache must fetch it from level k+1.must fetch it from level k+1. E.g., block 12E.g., block 12  If level k cache is full, then someIf level k cache is full, then some current block must be replacedcurrent block must be replaced (evicted). Which one is the(evicted). Which one is the “victim”?“victim”?  Placement policy: where can thewhere can the new block go? E.g., b mod 4new block go? E.g., b mod 4  Replacement policy: which blockwhich block 9 3 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Level k: Level k+1: 1414 12 14 4* 4*12 12 0 1 2 3 Request 12 4*4*12 General Caching Concepts -40-
  • 38. • Cache memory is intended to give memory speedCache memory is intended to give memory speed approaching that of the fastest memoriesapproaching that of the fastest memories availableavailable Cache Memory Principles -42-
  • 39. • When the processor attempts to read a word of memory, a check isWhen the processor attempts to read a word of memory, a check is made to determine if the word is in the cache. If so, the word ismade to determine if the word is in the cache. If so, the word is delivered to the processordelivered to the processor • If not, a block of main memory, consisting of some fixed numberIf not, a block of main memory, consisting of some fixed number of words, is read into the cache and then the word is delivered toof words, is read into the cache and then the word is delivered to the processorthe processor • Because of the phenomenon of locality of reference, when a blockBecause of the phenomenon of locality of reference, when a block of data is fetched into the cache to satisfy a single memoryof data is fetched into the cache to satisfy a single memory reference, it is likely that there will be future references to thatreference, it is likely that there will be future references to that same memory location or to other words in the block.same memory location or to other words in the block. Cache Memory Principles(cont’d) -43-
  • 41. • Main memory consists of up to 2Main memory consists of up to 2nn addressable words, with eachaddressable words, with each word having a unique n-bit address.word having a unique n-bit address. • For mapping purposes, this memory is considered to consist of aFor mapping purposes, this memory is considered to consist of a number of fixed-length blocks of K words each.number of fixed-length blocks of K words each. That is, there areThat is, there are M =2M =2nn /K/K blocks in main memory.blocks in main memory. • The cache consists of m blocks, calledThe cache consists of m blocks, called lineslines • NOTE:NOTE: In referring to the basic unit of the cache, the termIn referring to the basic unit of the cache, the term lineline isis used, rather than the termused, rather than the term blockblock • Each line containsEach line contains KK words, plus a tag of a few bitswords, plus a tag of a few bits • Each line also includes control bits (not shown), such as a bit toEach line also includes control bits (not shown), such as a bit to indicate whether the line has been modified since being loaded intoindicate whether the line has been modified since being loaded into the cachethe cache Cache memory principles(cont’d) -45-
  • 42. • The number of lines is considerably less than the number of mainThe number of lines is considerably less than the number of main memory blocks(m<=M)memory blocks(m<=M) • If a word in a block of memory is read, that block is transferred toIf a word in a block of memory is read, that block is transferred to one of the lines of the cache.one of the lines of the cache. • Because there are more blocks than lines, an individual line can-notBecause there are more blocks than lines, an individual line can-not be uniquely and permanently dedicated to a particular blockbe uniquely and permanently dedicated to a particular block • Thus, each line includes aThus, each line includes a tagtag that identifies which particular blockthat identifies which particular block is currently being stored.is currently being stored. • TheThe tagtag is usually a portion of the main memory addressis usually a portion of the main memory address Cache memory principles(cont’d) -46-
  • 44. Elements of cache DesignElements of cache Design -48-
  • 45. • AA logical cachelogical cache, also known as a virtual cache, stores data using, also known as a virtual cache, stores data using virtual addressesvirtual addresses • The processor accesses the cache directly,without going through theThe processor accesses the cache directly,without going through the MMU(Memory management Unit)MMU(Memory management Unit) • AdvantageAdvantage of the logical cache is thatof the logical cache is that cache access speed is fastercache access speed is faster than for a physical cache, because the cache can respond before thethan for a physical cache, because the cache can respond before the MMU performs an address translationMMU performs an address translation • DisadvantageDisadvantage? same virtual address in two different applications? same virtual address in two different applications refers to two different physical addressesrefers to two different physical addresses Cache Addresses(Logical) -49-
  • 46. • A physical cache stores data using main memory physicalA physical cache stores data using main memory physical addressesaddresses • The subject of logical versus physical cache is a complex one,The subject of logical versus physical cache is a complex one, and beyond the scope of this bookand beyond the scope of this book Cache Addresses(Physical) -50-
  • 47. 0 1 2 3 4 5 6 70 1 2 3Set Number Cache Fully (2-way) Set Direct Associative Associative Mapped anywhere anywhere in only into set 0 block 4 (12 mod 4) (12 mod 8) 0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 0 1 2 3 4 5 6 7 8 9 2 2 2 2 2 2 2 2 2 2 0 1 2 3 4 5 6 7 8 9 3 3 0 1 Memory Block Number block 12 can be placed Mapping functionsMapping functions -51-
  • 48. Main Memory Direct Mapping Cache Organization Tag 0 1 2 3 Cache Indicates block # Block #0 Block #1 Block #2 Block #3 Block #4 Block #5 Block #6 Block #7 Block #8 Block #9 Block #10 Block #11 Block #12 Block #13 Block #14 Block #15 Direct-Mapping Cache Organization Destination = (MemBlock Number) modulo (Total number of CacheBlocks) (Size = Log2((Total number of MemBlocks)/(Total number of CacheBlocks))) Block #1 Block #2 Block #3 Block #4 Tag Size = log2(16/4) = log2(4) 0 modulo 4 = 0 -52-
  • 49. Direct mapping(cont’d) Compare Main Memory Block WordTag s-r 0 1 2 3 Cache Memory w s-r r w s w s w s = Number of wires for row address w = Number of wires for column address s-r = log2(Memblocks/CacheBlock) = Switch 1 1 Miss Hit (s+w) -53-
  • 50. •Address length (s+w) bits • Number of addressable units =2s+w words or bytes • Block size= line size= 2w words or bytes • Number of blocks in main memory (2s+w )/(2w ) • Number of lines in cache= m= 2r • Size of cache 2r+w words or bytes • Size of tag (s-r) bit Direct mapping(cont’d) -54-
  • 51. Full-Associative Cache Organization Tag 0 1 2 3 Cache Indicates block # Main Memory Block #0 Block #1 Block #2 Block #3 Block #4 Block #5 Block #6 Block #7 Block #8 Block #9 Block #10 Block #11 Block #12 Block #13 Block #14 Block #15 Full-Associate Cache Organization (Size = Log2(Memory Block#)) A memory block can go to any cache block -55-
  • 52. Full Associative Cache Organization(cont’d) Compare WordTag Main Memory 0 1 2 3 Cache Memory w s s+w s Log2(CacheSize) s w s = log2(# of MemoryBlocks) 1 1 Miss Hit w -56-
  • 53. Set Associative Cache Organization Set-Associative Cache Organization 0 1 0 1 0 1 0 1 0 1 2 3 “Set Number” (two-way set associative) Block #0 Block #1 Block #2 Block #3 Block #4 Block #5 Block #6 Block #7 Block #8 Block #9 Block #10 Block #11 Block #12 Block #13 Block #14 Block #15 Main Memory Block #0 Block #0 Block #0 Block #0 Block #1 Block #1 Block #1 Block #1 Destination = (Memory Block) modulo (# of sets) -57-
  • 54. Set Associative Cache Organization Compare Main Memory w s-d WordTag Set 0 2 3 Cache Memory 1 d s-d s-d = ??? s w s+w s w 1 1 Miss Hit w -58-
  • 55. Replacement Policy •In an associative cache, which block from a set should be evicted when the set becomes full? • Random • Least-Recently Used (LRU) • LRU cache state must be updated on every access • true implementation only feasible for small sets (2-way) • pseudo-LRU binary tree often used for 4-8 way • First-In, First-Out (FIFO) • used in highly associative caches • Not-Most-Recently Used (NMRU) • FIFO with exception for most-recently used block or blocks This is a second-order effect. Why? Because Replacement only happens on misses -59-
  • 57. THE MEMORY HIERARCHY(revisitedTHE MEMORY HIERARCHY(revisited) Memory in a computer system is arranged in aMemory in a computer system is arranged in a hierarchy, as shown in Figurehierarchy, as shown in Figure -61-
  • 58. • At the top,we have primary storage, whichAt the top,we have primary storage, which consists of cache and main memory, andconsists of cache and main memory, and provides very fast access to data.provides very fast access to data. • Then comes secondary storage, which consistsThen comes secondary storage, which consists of slower devices such as magnetic disks.of slower devices such as magnetic disks. • Tertiary storage is the slowest class of storageTertiary storage is the slowest class of storage devices;devices; • for example, optical disks and tapesfor example, optical disks and tapes -62-
  • 59. • Slower storage devices such as tapes and disksSlower storage devices such as tapes and disks play an important role in database systemsplay an important role in database systems because the amount of data is typically verybecause the amount of data is typically very largelarge • Since buying enough main memory to store allSince buying enough main memory to store all data is prohibitively expensive, we must storedata is prohibitively expensive, we must store data on tapes and disks and build databasedata on tapes and disks and build database systems that can retrieve data from lowersystems that can retrieve data from lower levels of the memory hierarchy into mainlevels of the memory hierarchy into main memory as needed for processing.memory as needed for processing. -63-
  • 60. Why storing data on secondary memory?Why storing data on secondary memory? COSTCOST • The cost of a given amount of main memory is about 100The cost of a given amount of main memory is about 100 times the cost of the same amount of disk space, and tapestimes the cost of the same amount of disk space, and tapes are even less expensive than disks.are even less expensive than disks. PRIMARY STORAGE IS USUALLY VOLATILEPRIMARY STORAGE IS USUALLY VOLATILE • Main memory size is very small but the number of dataMain memory size is very small but the number of data objects may exceed this number!objects may exceed this number! Further, data must be maintained across program executions.Further, data must be maintained across program executions. This requires storage devices that retain information whenThis requires storage devices that retain information when the computer is restarted (after a shutdown or a crash); wethe computer is restarted (after a shutdown or a crash); we call such storage nonvolatilecall such storage nonvolatile -64-
  • 61. Primary storage is usually volatile (although itPrimary storage is usually volatile (although it is possible to make it nonvolatile by adding ais possible to make it nonvolatile by adding a battery backup feature), whereas secondarybattery backup feature), whereas secondary and tertiary storage is nonvolatile.and tertiary storage is nonvolatile. Tapes are relatively inexpensive and can storeTapes are relatively inexpensive and can store very large amounts of data.very large amounts of data.  They are a good choice for archival storage,They are a good choice for archival storage, i.e, when we need to maintain data for a longi.e, when we need to maintain data for a long period but do not expect to access it veryperiod but do not expect to access it very often.often. -65-
  • 62. DRAWBACKS OF TAPESDRAWBACKS OF TAPES  they are sequential access devicesthey are sequential access devices We must essentially step through all the data inWe must essentially step through all the data in order and cannot directly access a given locationorder and cannot directly access a given location on tapeon tape  This makes tapes unsuitable for storing operationalThis makes tapes unsuitable for storing operational data , or data that is frequently accessed.data , or data that is frequently accessed.  Tapes are mostly used to back up operational dataTapes are mostly used to back up operational data periodicallyperiodically.. -66-
  • 63. Magnetic disksMagnetic disks • support direct access to a desired locationsupport direct access to a desired location • widely used for database applicationswidely used for database applications • A DBMS provides seamless access to data on disk;A DBMS provides seamless access to data on disk; • applications need not worry about whether data isapplications need not worry about whether data is in main memory or diskin main memory or disk -67-
  • 64. Structure of a DiskStructure of a Disk -68-
  • 65. Disks(cont’d)Disks(cont’d) Data is stored on disk in units called disk blocksData is stored on disk in units called disk blocks A disk block is a contiguous sequence of bytes and isA disk block is a contiguous sequence of bytes and is the unit in which data is written to a disk and readthe unit in which data is written to a disk and read from a diskfrom a disk Blocks are arranged in concentric rings called tracks,Blocks are arranged in concentric rings called tracks, on one or more platterson one or more platters Tracks can be recorded on one or both surfaces of aTracks can be recorded on one or both surfaces of a platter; we refer to platters as single-sided or double-platter; we refer to platters as single-sided or double- sided accordinglysided accordingly -69-
  • 66. Disks(cont’d)Disks(cont’d) • The set of all tracks with the same diameter is called aThe set of all tracks with the same diameter is called a cylindercylinder, because the space occupied by these tracks is, because the space occupied by these tracks is shaped like a cylinder;shaped like a cylinder; • Each track is divided into arcs calledEach track is divided into arcs called sectorssectors, whose size is, whose size is a characteristic of the disk and cannot be changeda characteristic of the disk and cannot be changed • An array of disk heads, one per recorded surface, is movedAn array of disk heads, one per recorded surface, is moved as a unit; when one head is positioned over a block, theas a unit; when one head is positioned over a block, the other heads are in identical positions with respect to theirother heads are in identical positions with respect to their plattersplatters -70-
  • 67. Disks(cont’d)Disks(cont’d) • To read or write a block, a disk head must be positioned onTo read or write a block, a disk head must be positioned on top of the blocktop of the block • As the size of a platter decreases, seek times also decreaseAs the size of a platter decreases, seek times also decrease since we have to move a disk head a smaller distance.since we have to move a disk head a smaller distance. • Typical platter diameters are 3.5 inches and 5.25 inchesTypical platter diameters are 3.5 inches and 5.25 inches • A disk controller interfaces a disk drive to the computer. ItA disk controller interfaces a disk drive to the computer. It implements commands to read or write a sector by movingimplements commands to read or write a sector by moving the arm assembly and transferring data to and from the diskthe arm assembly and transferring data to and from the disk surfacessurfaces -71-
  • 68. Access time to a diskAccess time to a disk • The time to access a disk block has several componentsThe time to access a disk block has several components.. SeekSeek timetime is the time taken to move the disk heads to theis the time taken to move the disk heads to the track on which a desired block is located.track on which a desired block is located. Rotational delayRotational delay is the waiting time for the desiredis the waiting time for the desired block to rotate under the disk head;block to rotate under the disk head; it is the time required for half a rotation on average and isit is the time required for half a rotation on average and is usually less than seek timeusually less than seek time -72-
  • 69. Access time to a disk(cont’d)Access time to a disk(cont’d) Transfer timeTransfer time is the time to actually read or write the data inis the time to actually read or write the data in the block once the head is positioned, that is, the timethe block once the head is positioned, that is, the time for the disk to rotate over the block.for the disk to rotate over the block. Example The IBM Deskstar 14GPXExample The IBM Deskstar 14GPX - it is a 3.5 inch diameter,- it is a 3.5 inch diameter, -14.4 GB hard disk,-14.4 GB hard disk, -average seek time of 9.1milliseconds (msec)-average seek time of 9.1milliseconds (msec) - verage rotational delay of 4.17 msec.- verage rotational delay of 4.17 msec. -seek time from one track to the next is just 2.2 ms-seek time from one track to the next is just 2.2 ms -73-
  • 70. Performance Implications of Disk StructurePerformance Implications of Disk Structure 1.1. The unit for data transfer between disk and mainThe unit for data transfer between disk and main memory is a block;memory is a block; if a single item on a block is needed, the entire block isif a single item on a block is needed, the entire block is transferred.transferred. Reading or writing a disk block is called an I/O (forReading or writing a disk block is called an I/O (for input/output) operation.input/output) operation. 2. The time to read or write a block varies, depending on2. The time to read or write a block varies, depending on the location of the datathe location of the data access time = seek time + rotational delay +access time = seek time + rotational delay + transfer timetransfer time-74-
  • 71. Virtual MemoryVirtual Memory Use main memory as a “cache” for secondary (disk)Use main memory as a “cache” for secondary (disk) storagestorage  Managed jointly by CPU hardware and the operatingManaged jointly by CPU hardware and the operating system (OS)system (OS) Programs share main memoryPrograms share main memory  Each gets a private virtual address space holding itsEach gets a private virtual address space holding its frequently used code and datafrequently used code and data  Protected from other programsProtected from other programs CPU and OS translate virtual addresses to physicalCPU and OS translate virtual addresses to physical addressesaddresses  VM “block” is called a pageVM “block” is called a page  VM translation “miss” is called a page faultVM translation “miss” is called a page fault -75-
  • 72. Address TranslationAddress Translation Fixed-size pages (e.g., 4K)Fixed-size pages (e.g., 4K) -76-
  • 73. Page Fault PenaltyPage Fault Penalty On page fault, the page must be fetched from diskOn page fault, the page must be fetched from disk  Takes millions of clock cyclesTakes millions of clock cycles  Handled by OS codeHandled by OS code Try to minimize page fault rateTry to minimize page fault rate  Fully associative placementFully associative placement  Smart replacement algorithmsSmart replacement algorithms -77-
  • 74. Page TablesPage Tables Stores placement informationStores placement information  Array of page table entries, indexed by virtual pageArray of page table entries, indexed by virtual page numbernumber  Page table register in CPU points to page table inPage table register in CPU points to page table in physical memoryphysical memory If page is present in memoryIf page is present in memory  Page Table Entry stores the physical page numberPage Table Entry stores the physical page number  Plus other status bits (referenced, dirty, …)Plus other status bits (referenced, dirty, …) If page is not presentIf page is not present  Page Table Entry can refer to location in swap spacePage Table Entry can refer to location in swap space on diskon disk -78-
  • 75. Translation Using a Page TableTranslation Using a Page Table -79-
  • 76. Mapping Pages to StorageMapping Pages to Storage -80-
  • 77. Replacement and WritesReplacement and Writes To reduce page fault rate, prefer least-recently usedTo reduce page fault rate, prefer least-recently used (LRU) replacement(LRU) replacement  Reference bit (aka use bit) in PTE set to 1 on access toReference bit (aka use bit) in PTE set to 1 on access to pagepage  Periodically cleared to 0 by OSPeriodically cleared to 0 by OS  A page with reference bit = 0 has not been used recentlyA page with reference bit = 0 has not been used recently Disk writes take millions of cyclesDisk writes take millions of cycles  Block at once, not individual locationsBlock at once, not individual locations  Write through is impracticalWrite through is impractical  Use write-backUse write-back  Dirty bit in PTE set when page is writtenDirty bit in PTE set when page is written -81-
  • 78. Fast Translation Using a TLBFast Translation Using a TLB Address translation would appear to requireAddress translation would appear to require extra memory referencesextra memory references  One to access the PTEOne to access the PTE  Then the actual memory accessThen the actual memory access But access to page tables has good localityBut access to page tables has good locality  So use a fast cache of PTEs within the CPU Called aSo use a fast cache of PTEs within the CPU Called a Translation Look-aside Buffer (TLB)Translation Look-aside Buffer (TLB)  Typical: 16–512 PTEs, 0.5–1 cycle for hit, 10–100 cyclesTypical: 16–512 PTEs, 0.5–1 cycle for hit, 10–100 cycles for miss, 0.01%–1% miss ratefor miss, 0.01%–1% miss rate  Misses could be handled by hardware or softwareMisses could be handled by hardware or software -82-
  • 79. Fast Translation Using a TLBFast Translation Using a TLB -83-
  • 80. TLB MissesTLB Misses If page is in memoryIf page is in memory  Load the PTE from memory and retryLoad the PTE from memory and retry  Could be handled in hardwareCould be handled in hardware  Can get complex for more complicated page tableCan get complex for more complicated page table structuresstructures  Or in softwareOr in software  Raise a special exception, with optimized handlerRaise a special exception, with optimized handler If page is not in memory (page fault)If page is not in memory (page fault)  OS handles fetching the page and updating the pageOS handles fetching the page and updating the page tabletable  Then restart the faulting instructionThen restart the faulting instruction -85-
  • 81. TLB Miss HandlerTLB Miss Handler TLB miss indicatesTLB miss indicates  Page present, but PTE not in TLBPage present, but PTE not in TLB  Page not presetPage not preset Must recognize TLB miss before destinationMust recognize TLB miss before destination register overwrittenregister overwritten  Raise exceptionRaise exception Handler copies PTE from memory to TLBHandler copies PTE from memory to TLB  Then restarts instructionThen restarts instruction  If page not present, page fault will occurIf page not present, page fault will occur -85-
  • 82. Page Fault HandlerPage Fault Handler Use faulting virtual address to find PTEUse faulting virtual address to find PTE Locate page on diskLocate page on disk Choose page to replaceChoose page to replace  If dirty, write to disk first Read page into memory and update page tableRead page into memory and update page table Make process runnable againMake process runnable again  Restart from faulting instruction -86-
  • 83. TLB and Cache Interaction If cache tag usesIf cache tag uses physical addressphysical address  Need to translateNeed to translate before cachebefore cache lookuplookup Alternative: useAlternative: use virtual address tagvirtual address tag  ComplicationsComplications due to aliasingdue to aliasing  Different virtualDifferent virtual addresses foraddresses for shared physicalshared physical addressaddress -87-