This is a project implemented in VHDL. It is design of a multi-level cache memory for a uni-processor system. The document also includes some of the simulation and synthesis results.
A Critique of the Proposed National Education Policy Reform
Project Report Multilevel Cache
1. 2008
TERM PROJECT
Design of Multilevel Cache
Memory using VHDL
Anish Goel
216-67-817
FALL-08
NJIT
Computer Systems Architecture
Instructor: Prof. S.G. Ziavras
4. Page |4
1. Problem Statement:
To design a multilevel cache memory for a uni-processor system using VHDL.
Cache Memory Specifications:
CACHE SIZE MAPPING
L1 Cache 16KB 4-way set associative
L2 Cache 128KB 8-way set associative
Features:
Unified I & D cache at both levels L1 and L2
Set associative mapping
Write through policy
Common cache controller for L1 and L2
The project aims at designing the above mentioned memory hierarchy of cache memories for uni-processor
system and obtain the simulation results using the ModelSim platform. In addition, the Xilinx ISE platform
depicts the synthesized system for the designed VHDL code.
5. Page |5
2. Design Description:
The design consists of two levels of cache memory as Level 1 (L1) and Level 2 (L2) and a cache controller
that communicates between microprocessor and cache memories to carry out all memory related
operations. The size and specifications of the cache memories are stated in the problem specification
above and the design approach is described in the next section.
Figure 1 shows the block diagram of the designed system.
Microprocessor Cache
Controller
System Busses
Level 1 Cache Level 2
Memory Cache Memory
Figure 1: System block layout
The functionality of the design is explained below:
1. Cache controller receives the address that microprocessor wants to access.
2. Cache controller looks for the address in the L1 cache.
3. If the address is in L1 cache (cache hit occurs in L1), the data from the location is provided to the
microprocessor via the data bus.
4. If the address is not found in L1 cache i.e. cache miss occurs.
5. Cache controller looks for the same address in the L2 cache.
6. If the address is found in L2 cache (cache hit occurs in L2), the data from the location is provided to
the microprocessor and the same data is also replaced in the L1 cache.
7. If the address is not found in L2 cache i.e. cache miss occurs in L2.
8. The controller has to request the same address in the main memory. This functionality is not
modeled in the project, here the cache controller gives a signal to the microprocessor that a cache
miss has occurred in the L2 cache. The microprocessor should then take appropriate action.
6. Page |6
L1 and L2 specifications:
Physical Address: 32-bit
L1 Cache:
Refer to figure 2 for the internal architecture of L1 cache.
Address Format (fields)
Word Size: 32-bit (4 bytes)
Tag: 22-bit
Set Address: 8-bit
Word: 2-bit
Physical Memory Address: 32-bit
TAG: 22 bit SET: 8-bit Address WORD: 2-bit
L1 Cache Memory:
16KB 4-way set associative unified instruction and data cache.
Total number of sets: 256*4 = 1024 sets
L2 Cache:
Refer to figure 3 for the internal architecture of L2 cache.
Address Format (fields)
Word Size: 32-bit (4 bytes)
Tag: 20-bit
Set Address: 10-bit
Word: 2-bit
Physical Memory Address: 32-bit
TAG: 20 bit SET: 10-bit Address WORD: 2-bit
L2 Cache Memory:
128KB 8-way set associative unified instruction and data cache.
Total number of sets: 1024x8 = 8192 sets
7. Page |7
L1 Cache Memory Architecture
A0-A31 32-bit Address Bus
W
WA
WA Y
Word Address A0-A1 W AY 3
A Y 2
T Y 1
A Set Address A2-A9 0
G
A2-A9 A2-A9
A C Set 0: T0-T21 D Set 0: D0-D127
D A A A
D C
Set 1: T0-T21 T
Set 1: D0-D127
R H … A …
E E
S M
S D E
I M
R. O
Set 255: T0-T21 R Set 255: D0-D127
Y
A10-A31 T0-T21 Enable Data (4 Words) A0-A1
Tag Address Comparator Data buffer
Hit/Miss 32-bit Data
Figure 2: L1 Cache Block Diagram
8. Page |8
L2 Cache Memory Architecture
A0-A31 32-bit Address Bus WAY7
W
Word Address A0-A1 W A
A Y
Y 1
T 0
A Set Address A2-A11
G A2-A11 A2-A11
Set 0: T0-T19 Set 0: D0-D127
A C D
D A
Set 1: T0-T19 A
Set 1: D0-D127
A
D C … T …
R H A
E E
S M
S D E
I M
R. Set 1023: T0-T19 O Set 1023: D0-D127
R
Y
A10-A31 T0-T19 Data (4 Words) A0-A1
Enable
Tag Address Comparator Data buffer
Hit/Miss 32-bit Data
Figure 3: L2 Cache Block Diagram
9. Page |9
Cache Controller
The following diagram depicts all the signals of the cache controller that are used to carry out all the
memory related operations between microprocessor and L1 and L2 cache.
Reset Controller
Controller Busy DAV_L1 DAV_L2
Address Request
From microprocessor
Cache Address Bus A31 –AA0
To Main Memory
Cache Hit/Miss (L1)
From each Block Controller Data bus
D0-D31
L1 Enable
Read
Write
L2 Enable
Cache Hit/Miss (L2)
From each Bloc Address and Data Bus to L1 L2 Cache
Figure 4: Cache Controller Signals
DAV_L1/L2: Data valid from L1 or L2 cache memory on the system data bus when a cache hit occurs in the
corresponding block.
10. P a g e | 10
3. Design Approach:
The project is designed using mixed style of modeling in VHDL. ModelSim SE PLUS 6.2c platform from
Mentor Graphics is used as the design platform and simulator. To achieve the synthesis of the design, Xilinx
ISE 9.1i platform is used.
The basic storage element in the memory is modeled using a D flip-flop. Each D flip-flop stores a single bit.
Arrays of this storage element is constructed using structural style of modeling in VHDL to form registers
(for example: 22 bit tag register) and these registers are again used to create the complete memory array.
The memory consists of L1 cache that is arranged as follows:
L1 Cache capacity details
Cache data memory: Word size = 32 bits
Line size = 128 bits (4 words)
No. of lines = 256 per block
Thus total capacity is = 256*4 = 1KWords (4KB) {Per way}
Thus for 4 way set associative cache memory:
Total capacity is 1KWord x 4 = 4KWord (16KB)
Cache Tag memory: Tag size = 22 bits
Cache Tag comparator: 22 bit comparator
Input Output Buffer: 128 bits
The L2 cache is also designed using the same concept except for the difference that the size of the L2 cache
is much larger then L1 cache and also it is a 8-way set associative cache.
L2 Cache capacity details
Cache data memory: Word size = 32 bits
Line size = 128 bits (4 words)
No. of lines = 1024 per block
Thus total capacity is = 1024*4 = 4KWords (16KB) {per way}
Thus for 4 way set associative cache memory:
Total capacity is 4KWord x 8 = 32KWord (128KB)
Cache Tag memory: Tag size = 20 bits
Cache Tag comparator: 20 bit comparator
Input Output Buffer: 128 bits
All the operations in the L1 and L2 cache are guided by a cache controller. Any address request from the
microprocessor is first directed to the cache controller. The cache controller then looks for the address in
the L1 cache, if a cache hit occurs in L1 the data from the requested location is transferred to the
microprocessor. In case a cache miss occurs in L1, the cache controller looks for the same address in L2
cache and if a cache hit occurs in L2, the controllers transfers the same data to the microprocessor as well
as the L1 cache.
11. P a g e | 11
4. Simulation Results:
The following figures depict the simulation results of the higher entities like the L1, L2 cache and cache
controller.
The results of discrete blocks like memory decoder, cache tag comparator etc. is shown in appendix B.
L1 Cache:
Cache Miss in L1
Figure 5: Simulation Results L1 Cache
12. P a g e | 12
L2 Cache
Cache hit in Way 5 L2 cache (for same address)
Figure 6: Simulation Results L2 Cache
13. P a g e | 13
Cache Controller:
1 2 3
Figure 7: Simulation Results Cache Controller
1: Cache Hit in L1 cache for specified address.
2: Cache miss in L1 cache for different address then address in instance 1
3: Cache hit in L2 cache for same address as in address in instance 2
Important Note: The above simulation results are obtained with respect to specified locations to test the
functionality of the memory hierarchy. The data was previously stored on these addresses. However the
address request from the microprocessor depends in the program code. Also the microprocessor generates
address continuously and randomly based on the nature of the program. Thus to test the performance of
this cache a complete hardware is needed that will carry out the functionality of the microprocessor.
14. P a g e | 14
5. Observations:
1. Level 1 and Level 2 cache memories give the correct results at the output signals cache_hit and
Cache_miss if a match occurs between the tag part of the address requested by the microprocessor
and the corresponding entry in the cache directory.
2. The read/write pins do not have any signals (Logic levels) on them as it is to be specified by the
microprocessor as to a read operation or a write operation is to occur.
3. The cache controller delivers the appropriate signals to the cache memories L1 and L2 to match the
tag part of the address requested by the microprocessor and if a cache hit occurs, it indicates this
to the microprocessor by means of DAV_L1 or DAV_L2 (Data Valid) that the data over the data bus
is valid data requested by the microprocessor from the requested address.
4. A cache hit in L1 or L2 cache directly outputs the data from the requested address to the data bus.
This is not indicated in the above simulation result as many of the signals are activated in the
internal architecture and not visible in the higher level hierarchy.
5. To observe the results mention in the point 4 above, some of the blocks like tag comparator and
output buffer needs to be simulated separately. Some of these results are indicated in Appendix B.
15. P a g e | 15
Appendix A: VHDL Codes
The following are the VHDL codes for all the .vhd files in the project design.
Files related to L1 and L2 cache memories.
D Flip-Flop
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity dff is
Port ( d : in std_logic;
clk : in std_logic;
q : out std_logic;
en : in std_logic);
end dff;
architecture Behavioral of dff is
begin
process(clk)
begin
if en='1' then
if clk'event and clk='1'
then q<= d;
end if;
else q<= 'Z';
end if;
end process;
end Behavioral;
Cache Data Line: 128 bits (4, 32-bit words)
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity reg_128_data is
Port ( D : in std_logic_vector(127 downto 0);
clk : in std_logic;
Q : out std_logic_vector(127 downto 0);
en : in std_logic);
end reg_128_data;
architecture Behavioral of reg_128_data is
component dff
16. P a g e | 16
port(d: in std_logic;
q: out std_logic;
clk: in std_logic;
en : in std_logic);
end component;
signal outbuf: std_logic_vector(127 downto 0);
begin
gen: for i in 0 to 127 generate
mem: dff port map (d(i),q(i),clk,en);
end generate;
end Behavioral;
Data Block L1 Cache (256 Cache Lines)
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
--This block generates 256x32 cache data memory
entity data_mem is
Port ( Din : in std_logic_vector(127 downto 0);
Dout : out std_logic_vector(127 downto 0);
EN : in std_logic_vector(255 downto 0);
clk: in std_logic);
end data_mem;
architecture Behavioral of data_mem is
component reg_128_data
Port ( D : in std_logic_vector(127 downto 0);
clk : in std_logic;
Q : out std_logic_vector(127 downto 0);
en : in std_logic);
end component;
begin
GEN_array: for i in 0 to 255 generate
REGS: reg_128_data port map (Din(127 downto 0),clk,Dout(127 downto 0),EN(i));
end generate;
end Behavioral;
4:16 decoder
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity decoder4to16 is
Port ( D : in std_logic_vector(3 downto 0);
E : out std_logic_vector(15 downto 0);
17. P a g e | 17
F : in std_logic);
end decoder4to16;
architecture Behavioral of decoder4to16 is
begin
process(D,F)
begin
if F='1' then
case D is
when"0000"=>
E <= (others =>'0');
E(0) <= '1';
when"0001"=>
E <= (others =>'0');
E(1) <= '1';
when"0010"=>
E <= (others =>'0');
E(2) <= '1';
when"0011"=>
E <= (others =>'0');
E(3) <= '1';
when"0100"=>
E <= (others =>'0');
E(4) <= '1';
when"0101"=>
E <= (others =>'0');
E(5) <= '1';
when"0110"=>
E <= (others =>'0');
E(6) <= '1';
when"0111"=>
E <= (others =>'0');
E(7) <= '1';
when"1000"=>
E <= (others =>'0');
E(8) <= '1';
when"1001"=>
E <= (others =>'0');
E(9) <= '1';
when"1010"=>
E <= (others =>'0');
E(10) <= '1';
when"1011"=>
E <= (others =>'0');
E(11) <= '1';
when"1100"=>
E <= (others =>'0');
E(12) <= '1';
18. P a g e | 18
when"1101"=>
E <= (others =>'0');
E(13) <= '1';
when"1110"=>
E <= (others =>'0');
E(14) <= '1';
when others =>
E <= (others =>'0');
E(15) <= '1';
end case;
end if;
end process;
end Behavioral;
Memory Decoder: 8:256 decoder
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
--This Block Generates a 8:2^8 decoder
entity mem_decoder is
port(S: in std_logic_vector(7 downto 0);
EN: out std_logic_vector(255 downto 0);
Mem_EN: in std_logic);
end mem_decoder;
architecture Behavioral of mem_decoder is
component decoder4to16 is
Port ( D : in std_logic_vector(3 downto 0);
E : out std_logic_vector(15 downto 0);
F : in std_logic);
end component;
signal C1: std_logic_vector(15 downto 0);
begin
stage1: decoder4to16 port map(S(7 downto 4),C1(15 downto 0),Mem_EN);
struct: for i in 16 downto 1 generate
stage2: decoder4to16 port map (S(3 downto 0),EN(((16*i)-1) downto ((16*i)-16)),C1(i-1));
end generate;
end Behavioral;
Input Output Buffer:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
19. P a g e | 19
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity inout_buf is
Port ( A : inout std_logic_vector(127 downto 0);
B : inout std_logic_vector(127 downto 0);
WR : in std_logic;
RD : in std_logic);
end inout_buf;
architecture Behavioral of inout_buf is
begin
process(WR,RD)
begin
if WR='1' then
B<= A;
else B<= (others => ‘Z’);
if RD='1' then
A<= B;
else
A<=(others => ‘Z’);
end if;
end if;
end process;
end Behavioral;
Address Fields:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity address_field is
port(addr: in std_logic_vector(31 downto 0);
word: out std_logic_vector(1 downto 0);
set: out std_logic_vector(7 downto 0);
tag: out std_logic_vector(21 downto 0);
sep: in std_logic);
end address_field;
architecture Behavioral of address_field is
begin
process(addr)
begin
if sep = '1' then
word <= addr (1 downto 0);
20. P a g e | 20
set <= addr (9 downto 2);
tag <= addr (31 downto 10);
end if;
end process;
end Behavioral;
CACHE L1:
library ieee;
use ieee.std_logic_1164.all;
use IEEE.std_logic_arith.all;
use ieee.std_logic_unsigned.all;
entity memory_L1 is
port(Add: in std_logic_vector(7 downto 0);
Data: inout std_logic_vector(127 downto 0);
RD,WR,CLK,EN: in std_logic);
end memory_L1;
architecture struct of memory_L1 is
component data_mem is
Port ( Din : in std_logic_vector(127 downto 0);
Dout : out std_logic_vector(127 downto 0);
EN : in std_logic_vector(255 downto 0);
clk: in std_logic);
end component;
component inout_buf is
Port ( A : inout std_logic_vector(127 downto 0);
B : inout std_logic_vector(127 downto 0);
WR : in std_logic;
RD : in std_logic);
end component;
component mem_decoder is
port(S: in std_logic_vector(7 downto 0);
EN: out std_logic_vector(255 downto 0);
Mem_EN: in std_logic);
end component;
signal int: std_logic_vector(255 downto 0);
signal dat: std_logic_vector(127 downto 0);
begin
decoder: mem_decoder port map (Add,int,EN);
buff: inout_buf port map (Data,dat,WR,RD);
mem: data_mem port map (dat,Data,int,CLK);
end struct;
21. P a g e | 21
Tag Register : 22 –bit
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code.
--library UNISIM;
--use UNISIM.VComponents.all;
entity tag_data_L1 is
Port ( D : in std_logic_vector(21 downto 0);
clk : in std_logic;
Q : out std_logic_vector(21 downto 0);
en : in std_logic);
end tag_data_L1;
architecture Behavioral of tag_data_L1 is
component dff
port(d: in std_logic;
q: out std_logic;
clk: in std_logic;
en : in std_logic);
end component;
signal outbuf: std_logic_vector(21 downto 0);
begin
gen: for i in 0 to 21 generate
mem: dff port map (d(i),q(i),clk,en);
end generate;
end Behavioral;
Tag Memory : 256x22 bit
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
--This block generates 255x28 data cache tag memory
entity tag_mem is
Port ( Din : in std_logic_vector(21 downto 0);
Dout : out std_logic_vector(21 downto 0);
EN : in std_logic_vector(255 downto 0);
22. P a g e | 22
clk: in std_logic);
end tag_mem;
architecture Behavioral of tag_mem is
component tag_data_L1
Port ( D : in std_logic_vector(21 downto 0);
clk : in std_logic;
Q : out std_logic_vector(21 downto 0);
en : in std_logic);
end component;
begin
GEN_array: for i in 0 to 255 generate
REGS: tag_data_L1 port map (Din(21 downto 0),clk,Dout(21 downto 0),EN(i));
end generate;
end Behavioral;
Cache Tag Buffer:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity inout_buf_tag is
Port ( A : inout std_logic_vector(21 downto 0);
B : inout std_logic_vector(21 downto 0);
WR : in std_logic;
RD : in std_logic);
end inout_buf_tag;
architecture Behavioral of inout_buf_tag is
begin
process(WR,RD)
begin
if WR='1' then
B<= A; else B<= "ZZZZZZZZZZZZZZZZZZZZZZZ";
if RD='1' then
A<= B;
else A<="ZZZZZZZZZZZZZZZZZZZZZZZ";
end if;
end if;
end process;
end Behavioral;
23. P a g e | 23
Cache Tag Memory:
library ieee;
use ieee.std_logic_1164.all;
use IEEE.std_logic_arith.all;
use ieee.std_logic_unsigned.all;
entity cache_tag_data_L1 is
port(Add: in std_logic_vector(7 downto 0);
tag: inout std_logic_vector(21 downto 0);
RD,WR,CLK,EN: in std_logic);
end cache_tag_data_L1;
architecture struct of cache_tag_data_L1 is
component tag_mem is
Port ( Din : in std_logic_vector(21 downto 0);
Dout : out std_logic_vector(21 downto 0);
EN : in std_logic_vector(255 downto 0);
clk: in std_logic);
end component;
component inout_buf_tag is
Port ( A : inout std_logic_vector(21 downto 0);
B : inout std_logic_vector(21 downto 0);
WR : in std_logic;
RD : in std_logic);
end component;
component mem_decoder is
port(S: in std_logic_vector(7 downto 0);
EN: out std_logic_vector(255 downto 0);
Mem_EN: in std_logic);
end component;
signal int: std_logic_vector(255 downto 0);
signal dat: std_logic_vector(21 downto 0);
begin
decoder: mem_decoder port map (Add,int,EN);
buff: inout_buf_tag port map (tag,dat,WR,RD);
mem: tag_mem port map (dat,tag,int,CLK);
end struct;
24. P a g e | 24
Cache Tag Comparator:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
--This block compares the two addresses and produces a cache hit/miss signal
entity tag_comparator is
port( Addr_req: in std_logic_vector (21 downto 0);
Addr_tag: in std_logic_vector (21 downto 0);
tag_hit: out std_logic;
EN: in std_logic);
end tag_comparator;
architecture Behavioral of tag_comparator is
begin
process(Addr_req,Addr_tag,EN)
begin
tag_hit <= '0';
if EN = '1' then
if Addr_req = Addr_tag then
tag_hit <= '1';
end if;
end if;
end process;
end Behavioral;
L1 DATA BUFFER:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
-- 128 bit data buffer for L1 data cache
entity data_buff_L1 is
Port ( Din : in std_logic_vector(127 downto 0);
Dout : out std_logic_vector(31 downto 0);
EN : in std_logic_vector(1 downto 0);
clk: in std_logic);
end data_buff_L1;
architecture behaviour of data_buff_L1 is
begin
process(clk,EN)
begin
if clk'event and clk = '1'
25. P a g e | 25
then case EN is
when "00" => Dout <= Din(127 downto 96);
when "01" => Dout <= Din(95 downto 64);
when "10" => Dout <= Din(63 downto 32);
when others => Dout <= Din(31 downto 0);
end case;
end if;
end process;
end behaviour;
L1 Data Cache: Way 0
library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_arith.all;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity L1_data_way0 is
port( Add: in std_logic_vector(31 downto 0);
data_out : out std_logic_vector(31 downto 0);
cache_hit: out std_logic;
clk,EN,RD,WR: in std_logic);
end L1_data_way0;
architecture structure of L1_data_way0 is
signal F1: std_logic_vector(1 downto 0);
signal F2: std_logic_vector(7 downto 0);
signal F3: std_logic_vector(21 downto 0);
signal data: std_logic_vector(127 downto 0);
signal tag: std_logic_vector(21 downto 0);
signal select_add: std_logic_vector(255 downto 0);
component memory_L1 is
port(Add: in std_logic_vector(7 downto 0);
Data: inout std_logic_vector(127 downto 0);
RD,WR,CLK,EN: in std_logic);
end component;
component cache_tag_data_L1 is
port(Add: in std_logic_vector(7 downto 0);
tag: inout std_logic_vector(21 downto 0);
RD,WR,CLK,EN: in std_logic);
end component;
component address_field is
port(addr: in std_logic_vector(31 downto 0);
word: out std_logic_vector(1 downto 0);
26. P a g e | 26
set: out std_logic_vector(7 downto 0);
tag: out std_logic_vector(21 downto 0);
sep: in std_logic);
end component;
L1 4-way set associative cache: Main hierarchy for L1 cache memory.
library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_arith.all;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity L1_cache is
port( Add: in std_logic_vector(31 downto 0);
data_out : inout std_logic_vector(31 downto 0);
cache_hit: out std_logic_vector(3 downto 0);
clk,RD,WR: in std_logic;
EN: in std_logic_vector(3 downto 0));
end L1_cache;
architecture structure of L1_cache is
component L1_data_way0 is
port( Add: in std_logic_vector(31 downto 0);
data_out : inout std_logic_vector(31 downto 0);
cache_hit: out std_logic;
clk,EN,RD,WR: in std_logic);
end component;
begin
sets: for i in 0 to 3 generate
struct: L1_data_way0 port map (Add,data_out,cache_hit(i),clk,EN(i),RD,WR);
end generate;
end structure;
L2 Cache memory uses all the above specified .vhd files. Changes are made accordingly to increase the register size and capacity.
27. P a g e | 27
Cache Controller VHDL code:
library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_arith.all;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity cache_controller is
Port (add_req: in std_logic_vector(31 downto 0);
C_busy: out std_logic;
C_reset,clk,EN:in std_logic;
L1_miss,L2_miss,RD: out std_logic;
Add_L1: out std_logic_vector(31 downto 0);
DAV_L1: out std_logic;
cache_hit_l1: in std_logic_vector(3 downto 0);
Add_L2: out std_logic_vector(31 downto 0);
DAV_L2: out std_logic;
cache_hit_l2: in std_logic_vector(7 downto 0));
end cache_controller;
Architecture behaviour of cache_controller is
signal C1,C2: std_logic_vector(3 downto 0);
begin
process(add_req,EN)
begin
if clk'event and clk='1' then
if EN = '1' then
Add_L1 <= add_req;
if (cache_hit_l1(0)='1' or cache_hit_l1(1)='1' or cache_hit_l1(2)='1' or cache_hit_l1(3)='1')
then L1_miss <= '0';
DAV_l1<='1';
else
L1_miss <= '1';
Add_L2 <= add_req;
if (cache_hit_l2(0)='1' or cache_hit_l2(1)='1' or cache_hit_l2(2)='1' or cache_hit_l2(3)='1' or
cache_hit_l2(4)='1' or cache_hit_l2(5)='1' or cache_hit_l2(6)='1' or cache_hit_l2(7)='1') then
L2_miss <= '0';
DAV_l2<='1';
else
L2_miss <= '1';
if clk'event and clk='1' then
c1<=c1+1;
Add_L2<=add_req+1;
RD <=clk;
else
28. P a g e | 28
L2_miss <= '1';
end if;
end if;
end if;
end if;
end if;
end process;
end behaviour;
29. P a g e | 29
Appendix B: Simulation Results of Discrete Blocks
Data Buffer L1 Cache
32 Bit word Output selected using A0-A1 Address Lines
Cache Tag Comparator
Cache Hit Cache Miss
30. P a g e | 30
Address Field Separator
Memory Decoder
31. P a g e | 31
Appendix C: Synthesis Results
L1 Cache Memory
Signal Description
Add(31:0) 32 bit address from microprocessor
Clk: Clock input
EN: Memory enable/select signal
RD,WR: Read, Write Signal
Cache_hit: cache hit/miss signal
Data_out(31:0): Bi-directional data bus
L1 Cache Memory Block generated using Synthesis Tool
Internal Architecture:
Includes Blocks:
Address field Separator
Cache data memory
Cache Tag memory
Cache tag comparator
Input/output buffer.
Internal Architecture of L1 Cache memory Block
L2 cache memory is identical to the L1 cache memory with only difference in number of sets per blocks and total number of blocks.
32. P a g e | 32
Cache Controller
The figure depicts all the control signals and buses of the cache controller of the system.
Add_req: Address request from the microprocessor
Cache_hit_l1(3:0): cache hit from L1 cache memory block
Cache_hit_l2(7:0): cache hit from L2 cache memory block
Add_L1(31:0): Address bus to L1 cache
Add_L2(31:0): Address bus to L2 cache
C_busy: Cache controller busy (status signal)
Clk: Clock input
DAV_L1/L2: Data valid on data bus from respective cache memory
C_reset: Reset Cache controller
L1/L2_miss: Cache miss from L1/L2 cache
RD: Read Cache controller status
The figure below shows the internal architecture of the cache controller synthesized using the Xilinx ISE 9.1i
platform.
Cache Controller internal Architecture
33. P a g e | 33
A view of the Xilinx ISE 9.1i Synthesis Tool window
A view of the ModelSim SE Plus 6.2c Simulation Tool window
34. P a g e | 34
References
Computer Architecture and Organization By: John P. Hayes. (Mc Graw Hill publication)
Fundamentals of Digital Logic with VHDL design By: Stephen Brown & Zvonko Vranesic (TATA Mc
Graw Hill)
A Circuit Design of 32KByte Integrated Cache Memory. TOSHIBA Corporation, TOSHIBA
Microcomputer Eng.Corp.
http://www.ece.cmu.edu/~ece741
http://en.kioskea.net/pc/memoire.php3
Computer Architecture - A Quantitative Approach, Fourth Edition by John L. Hennessy and David A.
Patterson
Advanced Computer Architecture: Parallelism, Scalability, Programmability By Kai Hwang
http://web.njit.edu/~rlopes/cache-performance.pdf
Lecture notes on memory hierarchy design by Prof. S.G. Ziavras including
http://web.njit.edu/~ziavras/ECE690-NEW/SYLLABUS-NOTES/CH-5-APP-C/AppC-ch-5-m1-
Ziavras.pdf
http://cs.uccs.edu/~cs520/S99ch5.PDF
High performance memories. By : Betty Prince