Memory map selection of real time sdram controller using verilog full project report

Project Report On
Memory map selection of real time SDRAM
controller using Verilog
By
RAHUL VERMA
(9015694258)

vi
TABLE OF CONTENTS Page
DECLARATION ............................................................................................................................ii
CERTIFICATE .............................................................................................................................iii
ACKNOWLEDGEMENTS ..........................................................................................................iv
ABSTRACT ..................................................................................................................................vi
LIST OF FIGURES .....................................................................................................................vii
LIST OF TABLES........................................................................................................................viii
LIST OF ABBREVIATION……………………………………………………………………...ix
CHAPTER 1 (INTRODUCTION)……………………………………………………………..01
1.1 LITERATURE SURVEY……………………………………………………...02
1.2 GOAL OF THE PROJECT…………………………………………………….03
CHAPTER 2 (BACKGROUND)………………………………………………………………04
2.1 RANDOM ACCESS MEMORY…………………………………........ ……..04
2.2 STATIC RANDOM ACCESS MEMORY …………………………………....04
2.3 DYNAMIC RANDOM ACCESS MEMORY ……………..………………….05
2.4 DEVELOPMENT OF DRAM ………………………………………………...06
2.4.1 DRAM …………………………………………………………………...07
2.4.2 SYNCHRONOUS DRAM……………………………………………….07
2.4.3 DDR1SDRAM…………………………………………………………....08
2.4.4 DDR2SDRAM……………………………………………………………08

vii
2.4.5 DDR3SDRAM………………………………………………………..…09
2.5 TIMELINE……………………………………………………………………09
CHAPTER 3 (METHODOLOGY)…………………………………………………………...11
3.1 HARDWARE…………………………………………………………………11
3.1.1 VIRTEX-6FPGA………………………………………………………..11
3.1.2 ML605 BOARD………………………………………………………...12
3.2 TOOLS………………………………………………………………………..12
3.2.1 XILINX INTERGRATED SOFTWARE ENVIRONMENT(ISE)……..13
3.2.2 SYNTHESIS AND SIMULATION……………………. ……………..14
3.2.3 IMPLEMENTATION AND HARDWARE VALIDATION…………...14
3.2.4 ANALYSIS OF TURN-AROUND TIMES…………………………….17
3.2.5 XILINX CORE GENERATOR…………………………………………19
CHAPTER 4 (ARCHITECTURE)……………………………………………………………20
4.1 CONTROL INTERFACE MODULE…………………………………………21
4.2 COMMAND MODULE…………………….……………….………………...22
4.3 DAPATH MODULE…………………………………………………………..24
CHAPTER 5 (OPERATION).....................................................................................................25
5.1 SDRAM OVERVIEW…………………………………………………………26
5.2 FUNCTIONAL DESCRIPTION………………………………………………27
5.3 SDRAM CONTROLLER COMMAND INTERFACE……………………….28
5.3.1 NOP COMMAND……………………………………………………….29
5.3.2 READA COMMAND…………………………………………………...30
5.3.3 WRITEA COMMAND……………………………………………….…31
5.3.4 REFRESH COMMAND…………………………………………….…..32

viii
5.3.5 PRECHARGE COMMAND………………………………………….....34
5.3.6 LOAD_MODE COMMAND……………………………………………35
5.3.7 LOAD_REG1 COMMAND……………………………………………..36
5.3.8 LOAD_REG2 COMMAND……………………………………………..37
CHAPTER 6 (ELEMENTS OF MEMORY BANK)…………………………………………38
6.1 DECODER…………………………………………………………………….38
6.1.1 A 2 TO 4 SINGLE BIT DECODER…………………………………….38
6.2 DEMUX………………………………………………………………………..40
6.3 RAM…………………………………………………………………………...41
6.3.1 TYPES OF RAM………………………………………………………...42
6.4 MUX…………………………………………………………………………...44
6.5 BUFFER……………………………………………………………………….45
6.5.1 VOLTAGE BUFFER…………………………………………………….46
6.5.2 CURRENT BUFFER…………………………………………………….47
6.6 MEMORY BANK……………………………………………………………..48
CHAPTER 7 (RESULT AND CONCLUSIONS)……………………………………………..51
7.1 POWER CONSUMED WHEN ALL 8 BANKS ARE ON…………..………51
7.1.1 PROJECT………………………………………………………...………51
7.1.2 DEVICE ………………………………………………………………….51
7.1.3 ENVIRONMENT …………………………………………………….,,,,.52
7.1.4 DEFAULT ACTIVITY………...………………………………….……..52
7.1.5 ON-CHIP POWER SUMMARY………………………………………...53
7.1.6 THERMAL SUMMARY………………………………………………...53
7.1.7 POWER SUPPLY SUMMARY………………………………………….53
7.1.8 CONFIDENCE LEVEL………………………………………………….54
7.1.9 BY HIERARCHY………………………………………………………..55

ix
7.2 POWER CONSUMED WHEN ONLY ONE MEMORY BANK IS IN USE…..56
7.2.1. PROJECT………………………………………………………………..56
7.2.2 DEVICE……………………………………………………………….....56
7.2.3 ENVIRONMENT………………………………………………………...57
7.2.4 DEFAULT ACTIVITY RATES…………………………………………57
7.2.5 ON-CHIP POWER SUMMARY………………………………………...58
7.2.6 THERMAL SUMMARY………………………………………………...58
7.2.7 POWER SUPPLY SUMMARY……………………………………...….58
7.2.8 CONFIDENCE LEVEL………………………………………………….59
7.2.9 BY HIERARCHY………………………………………………………..60
7.3 CONCLUSION…………………………………………………………….….60
CHAPTER 8 (FUTURE SCOPE)……………………………………………………………...61
REFERENCES...............................................................................................................................62

x
LIST OF FIGURES Page
Figure 2.1 DRAM Row Access Latency vs. Year 09
Figure 2.2 DRAM Column Address Time vs. Year 10
Figure 3.1 Screenshot of ISE Project Navigator 13
Figure 3.2 Flow Chart and Timing for Simulation and Hardware Validation 15
Figure 3.3 ISim Screen Shot 18
Figure 3.4 CHIPSCOPE Screen Shot 19
Figure 4.0 Architecture of SDRAM controller 20
Figure 4.1 Control Interface Module 21
Figure 4.2 Command Module Block Diagram 23
Figure 4.3 Data Path Module 24
Figure 5.0 SDR SDRAM Controller System-Level Diagram 25
Figure 5.1 Timing diagram for a READA command 30
Figure 5.2 Timing diagram for a WRITEA command 31
Figure 5.3 Timing diagram for a REFRESH command 32
Figure 5.4Timing diagram for a PRECHARGE command 34
Figure 5.5 Timing diagram for a PRECHARGE command 35
Figure 6.1 RTL of decoder 39

xi
Figure 6.2 Simulation of Decoder 40
Figure 6.3 RTL of DEMUX 41
Figure 6.4 Simulation Of DEMUX 42
Figure 6.5 RTL of RAM 44
Figure 6.6 Simulation of RAM 44
Figure 6.7 RTL of MUX 46
Figure 6.8 Simulation of MUX 46
Figure 6.9 RTL of Buffer 48
Figure 6.10 Simulation of Buffer 49
Figure 6.11 RTL of Memory Bank 50
Figure 6.12 Simulation of Memory Bank 50

xii
LIST OF TABLES Page
Table 5.1 SDRAM Bus Commands 26
Table 5.2 Interface Signals 28
Table 5.3 Interface Commands 29
Table 5.4 REG1 Bit Definitions 36
Table 7.1 Project 51
Table 7.2 Device 51
Table 7.3 Environment 52
Table 7.4 Default Activity 52
Table 7.5 On-Chip Power Summary 53
Table 7.6 Thermal Summary 53
Table 7.7 Power Supply Summary 53
Table 7.8 Power Supply Current 54
Table 7.9 Confidence Level 54
Table 7.10 By Hierarchy 55
Table 7.11 Project 56
Table 7.12 Device 56

xiii
Table 7.13 Environment 57
Table 7.14 Default Activity 57
Table 7.15 On-Chip Power Summary 58
Table 7.16 Thermal Summary 58
Table 7.17 Power Supply Summary 58
Table 7.18 Power Supply Current 59
Table 7.19 Confidence Level 59
Table 7.20 By Hierarchy 60

xiv
LIST OF ABBREVIATIONS
A/D Analog To Digital
CAS Column Address Strobing
CLB Configurable Logic Block
DRAM Dynamic Random-Access Memory
FPGA Field-Programmable Gate Array
ISE Integrated Software Environment
I/O Input/ Output
LUTs Look-Up Tables
NCD Native Circuit Description
RAM Random Access Memory
RAS Row Address Strobing
ROM Read Only Memory
SDRAM Synchronous Dynamic Random-Access Memory
SRAM Static Random-Access Memory
XST Xilinx Synthesis Technology

1
CHAPTER 1
INTRODUCTION
Embedded applications with real-time requirements are mapped to heterogeneous multiprocessor
systems. The computational demands placed upon these systems are continuously increasing,
while power and area budgets limit the amount of resources that can be expended to reduce
costs, applications are often forced to share hardware resources. Functional correctness for Real-
Time application is only guaranteed if their timing requirements are considered throughout the
entire system when the requirements are not met, it may cause an unacceptable loss of
functionality or severe quality degradation. We focus on the real-time properties of the (off-chip)
memory.
SDRAM is a commonly used memory type because it provides a large amount of storage space at
low cost per bit. It comprises a hierarchical structure of banks and rows that have to be opened
and closed explicitly by the memory controller, where only one row in each bank can be open at
a time. Requests to the open row are served at a low latency, while request to a different row
results in a high latency, since it requires closing the open row and subsequent opening
of the requested row. Locality thus strongly influences the performance of the memory
subsystem.
The worst-case (minimum) bandwidth and worst-case (maximum) latency are determined by
the way requests are mapped to the memory. The worst-case latency can be optimized by
accessing the memory at a small granularity (i.e. few words), such that the individual requests
take a small amount of time to complete. This allows fine-grained sharing of the memory
resource, at the expense of efficiency, since the overhead of opening and closing rows is
amortized over only a small number of bits. Latency sensitive requests like cache misses favor
this configuration. Conversely, to optimize for bandwidth, the memory has to be used as
efficiently as possible, which requires memory maps that use a large access granularity.

2
Existing memory controllers offer only limited conﬁgurability of the memory mapping and are
unable to balance this trade-off based on the application requirements .A memory controller
must take the latency and bandwidth requirements of all of its applications into account, while
staying within the given power budget. This requires an understanding of the effect that
different memory maps have on the attainable worst-case bandwidth, latency and power.
1.1 LITERATURE SURVEY
Synchronous DRAM (SDRAM) has become a mainstream memory of choice in embedded
system memory design due to its speed, burst access and pipeline features. For high-end
applications using processors such as Motorola MPC 8260 or Intel StrongArm, the interface to
the SDRAM is supported by the processor’s built-in peripheral module. However, for other
applications, the system designer must design a controller to provide proper commands for
SDRAM initialization, read/write accesses and memory refresh.
In some cases, SDRAM is chosen because the previous generations of DRAM (FP and EDO) are
either end-of-life or not recommended for new designs by the memory vendors. From the board
design point of view, design using earlier generations of DRAM is much easier and more
straightforward than using SDRAM unless the system bus master provides the SDRAM interface
module as mentioned above. This SDRAM controller reference design, located between the
SDRAM and the bus master, reduces the user’s effort to deal with the SDRAM command
interface by providing a simple generic system interface to the bus master.
In today's SDRAM market, there are two major types of SDRAM distinguished by their data
transfer rates. The most common single data rate (SDR) SDRAM transfers data on the rising edge
of the clock. The other is the double data rate (DDR) SDRAM which transfers data on both the
rising and falling edge to double the data transfer throughput. Other than the data transfer phase,
the different power-on initialization and mode register definitions, these two SDRAMs share the
same command set and basic design concepts. This reference design is targeted for SDR
SDRAM, however, due to the similarity of SDR and DDR SDRAM, this design can also be
modified for a DDR SDRAM controller.

3
For illustration purposes, the Micron SDR SDRAM MT48LC32M4A2 (8Meg x 4 x 4 banks) is
chosen for this design. Also, this design has been verified by using Micron’s simulation model.
It is highly recommended to download the simulation model from the SDRAM vendors for
timing simulation when any modifications are made to this design.
Several SDRAM controllers focusing on real-time applications have been proposed, all trying to
maximize the worst case performance. Uses a static command schedule computed at design time.
Full knowledge of the application behavior is thus required, making it unable to deal with
dynamism in the request streams. The controller proposed in dynamically schedules pre-
computed sequences of SDRAM commands according to a fixed set of scheduling rules. The
controller proposed in follows a similar approach. Dynamically schedules commands at run-time
according to a set of rules from which an upper bound on the latency of a request is determined
and use a memory map that always interleaves requests over all banks in the SDRAM, which sets
a high lower bound on the smallest request size that can be supported efficiently. Supports
multiple bursts to each bank in an access to increase guaranteed bandwidth for large requests.
Allows only single burst accesses to all banks in a fixed sequential manner, although multiple
banks can be clustered to create a single logical resource. None of the mentioned controllers take
power into account, despite it being an increasingly important design constraint.
1.1 GOAL OF THE PROJECT
1) We explore the full memory map design space by allowing requests to be interleaved over a
variable number of banks. This reduces the minimum access granularity and can thus be
beneficial for applications with small requests or tight latency constraints.
2) We propose a configuration methodology that is aware of the real-time and power constraints,
such that an optimal memory map can be selected.

4
CHAPTER 2
BACKGROUND
There are two different types of random access memory: synchronous and dynamic.
Synchronous random access memory (SRAM) is used for high-speed, low power applications
while dynamic random access memory (DRAM) is used for its low cost and high density.
Designers have been working to make DRAM faster and more energy efficient .The following
sections will discuss the differences between these two types of RAM, as well as present the
progression of DRAM towards a faster, more energy efficient design.
2.1 RANDOM ACCESS MEMORY
Today, the most common type of memory used in digital systems is random access memory
(RAM). The time it takes to access RAM is not affected by the data’s location in memory. RAM
is volatile, meaning if power is removed, then the stored data is lost. As a result, RAM cannot be
used for permanent storage. However, RAM is used during runtime to quickly store and retrieve
data that is being operated on by a computer. In contrast, nonvolatile memory, such as hard
disks, can be used for storing data even when not powered on. Unfortunately, it takes much
longer for the computer to store and access data from this memory. There are two types of
RAM: static and dynamic. In the following sections the differences between the two types and
the evolution of DRAM will be discussed.
2.2 STATIC RANDOM ACCESS MEMORY
Static random access memory (SRAM) stores data as long as power is being supplied to the
chip.

5
Each memory cell of SRAM stores one bit of data using six transistors: a flip flop and two
access transistors (i.e. four transistors). SRAM is the faster of the two types of RAM because it
does not involve capacitors, which involve sense amplification of a small charge. For this
reason, it is used in cache memory of computers. Additionally, SRAM requires a very small
amount of power to maintain its data in standby mode Although SRAM is fast and energy
efficient it is also expensive due to the amount of silicon needed for its large cell size. This
presented the need for a denser memory cell, which brought about DRAM.
2.3 DYNAMIC RANDOM ACCESS MEMORY
According to Wakerly , “In order to build RAMs with higher density (more bits per chip), chip
designers invented memory cells that use as little as one transistor per bit. Each DRAM cell
consists of one transistor and a capacitor. Since capacitors “leak” or lose charge over time,
DRAM must have a refresh cycle to prevent data loss.
According to a high-performance DRAM study on earlier versions of DRAM, DRAM’s
refresh cycle is one reason DRAM is slower than SRAM. The cells of DRAM use sense
amplifiers to transmit data to the output buffer in the case of a read and transmit data back to the
memory cell in the case of a refresh. During a refresh cycle, the sense amplifier reads the
degraded value on a capacitor into a D- Latch and writes back the same value to the capacitor
so it is charged correctly for 1 or 0. Since all rows of memory must be refreshed and the sense
amplifier must determine the value of a, already small, degenerated capacitance, refresh
takes a significant amount of time. The refresh cycle typically occurs about every 64
milliseconds the refresh rate of the latest DRAM (DDR3) is about 1 microsecond.
Although refresh increases memory access time, according to a high-performance DRAM study
on earlier versions of DRAM, the greatest amount of time is lost during row
addressing, more specifically, “[extracting] the required data from the sense amps/row caches” .
During addressing, the memory controller first strobes the row address (RAS) onto the address
bus. Once the RAS is sent, a sense amplifier (one for each cell in the row) determines if a
charge indicating a 1 or 0 is loaded into each capacitor.

6
This step is long because “the sense amplifier has to read a very weak charge” and “the row is
formed by the gates of memory cells.” The controller then chooses a cell in the row from
which to read from by strobing the column address (CAS) onto the address bus. A write
requires the enable signal to be asserted at the same time as the CAS, while a read requires the
enable signal to be de-asserted. The time it takes the data to move onto the bus after the CAS is
called the CAS latency.
Although recent generations of DRAM are still slower than SRAM, DRAM is used when a
largeramount of memory is required since it is less expensive. For example, in embedded
systems, a small block of SRAM is used for the critical data path, and a large block of DRAM
is used to satisfy all other needs .The following section will discuss the development of
DRAM into a faster, more energy efficient memory.
2.4 DEVELOPMENT OF DRAM
Many factors are considered in the development of high performance RAM. Ideally, the
developer would always like memory to transfer more data and respond in less time; memory
would have higher bandwidth and lower latency. However, improving upon one factor often
involves sacrificing the other.
Bandwidth is the amount of data transferred per second. It depends on the width of the data
bus and the frequency at which data is being transferred. Latency is the time between when the
address strobe is sent to memory and when the data is placed on the data bus. DRAM is
slower than SRAM because it periodically disables the refresh cycle and because it takes a much
longer time to extract data onto the memory bus. Advancements have been, however, to
several different aspects of DRAM to increase bandwidth and decrease latency.
Over time, DRAM has evolved to become faster and more energy efficient by decreasing in cell
size and increasing in capacity. In the following section, we will look at different types of
DRAM and how DDR3 memory has come to be.

7
2.4.1 DRAM
One of the reasons the original DRAM was very slow is because of extensive addressing
overhead. In the original DRAM, an address was required for every 64-bit access to memory.
Each access took six clock cycles. For a four 64-bit access to consecutive addresses in memory,
the notation for timing was 6-6-6-6. Dashes separate memory accesses and the numbers indicate
how long the accesses take. This DRAM timing example took 24 cycles to access the memory
four times. In contrast, more recent DRAM implements burst technology which can send
many 64-bit words toconsecutive addresses. While the first access still takes six clock cycles
due memory accessing, the next three adjacent addresses can be performed in as little as one
clock cycle since the addressing does not need to be repeated.
During burst mode, the timing would be 6-1-1-1, a total of nine clock cycles. The original DRAM
is also slower than its descendants because it is asynchronous. This means there is no memory
bus clock to synchronize the input and output signals of the memory chip. The timing
specifications are not based on a clock edge, but rather on maximum and minimum timing
values (in seconds). The user would need to worry about designing a state machine with idle
states, which may be inconsistent when running the memory at different frequencies.
2.4.2 Synchronous DRAM
In order to decrease latency, SDRAM utilizes a memory bus clock to synchronize signals to and
from the system and memory. Synchronization ensures that the memory controller does not need
to follow strict timing; it simplifies the implemented logic and reduces memory access latency.
With a synchronous bus, data is available at each clock cycle.
SDRAM divides memory into two to four banks for concurrent access to different parts of
memory.Simultaneous access allows continuous data flow by ensuring there will always be a
memory bank read for access. The addition of banks adds another segment to the addressing,
resulting in a bank, row and column address.

8
The memory controller determines if an access addresses the same bank and row as the
previous access, so only a column address strobe must be sent. This allows the access to occur
much more quickly and can decrease overall latency.
2.4.3 DDR1 SDRAM
DDR1 SDRAM (i.e. first generation of SDRAM) doubles the data rate (hence the term DDR)
of SDRAM without changing clock speed or frequency. DDR transfers data on both the rising
and falling edge of the clock, has a pre-fetch buffer and low voltage signaling, which makes
it more energy efficient than previous designs.
Unlike SDRAM, which transfers 1 bit per clock cycle from the memory array to the data queue,
DDR1 transfers 2 bits to the queue in two separate pipelines. The bits are released in order on the
same output line. This is called a 2n-prefetch architecture. In addition, DDR1 utilizes double
transition clocking by triggering on both the rising and falling edge of the clock to transfer data.
As a result, the bandwidth of DDR1 is doubled without an increase in the clock frequency.
In addition to doubling the bandwidth, DDR1 made advances is energy efficiency. DDR1 can
operate at 2.5V instead of the 3.3V operating point of SDRAM thanks to low voltage signaling
technology.
2.4.4 DDR2 SDRAM
Data rates of DDR2 SDRAM are up to eight times more than original SDRAM. At an operation
voltage of1.8V, it achieves lower power consumption than DDR1. DDR2 SDRAM has a 4-bit
prefetch buffer, an improvement from the DDR12-bit prefetch. This means that 4 bits are
transferred per clock cycle from the memory array to the data bus, which increases bandwidth.

9
2.4.5 DDR3 SDRAM
DDR3 provides two burst modes for both reading and writing: burst chop (BC4) and burst
length eight (BL8). BC4 allows bursts of four by treating data as though half of it is masked.
This creates smooth transitioning if switching from DDR2 to DDR3 memory. However, burst
mode BL8 is the primary burst mode. BL8 allows the most data to be transferred in the least
amount of time; it transfers the greatest number of 64-bit data packets (eight) to or from
consecutive addresses in memory, which means addressing occurs once for every eight data
packets sent. In order to support a burst length of eight data packets, DDR3 SDRAM has an 8-
bit prefetch buffer.DDR3, like its predecessors, not only improves upon bandwidth, but also
energy conservation.Power consumption of DDR3 can be up to 30 percent less than DDR2. The
DDR3 operating voltage is the lowest yet, at 1.5 V, and low voltage versions are supported at
voltages of 1.35 V.
2.5 TIMELINE
Ideally, memory performance would improve at the same rate as central processing unit
(CPU) performance. However, memory latency has only improved about five percent each
year . The longest latency (RAS latency) of the newest release of DRAM for each year is
shown in the plot in Figure 2.1.
Figure 2.1 DRAM Row Access Latency vs. Year

10
As seen in Figure 2.1, the row access latency decreases linearly with every new release of
DRAM until 1996. Once SDRAM is released in 1996, the difference in latency from year to
year is much smaller. With recent memory releases it is much more difficult to reduce RAS
latency.
This can be seen especially for DDR2 and DDR3 memory releases 2006 to 2012.CAS latency,
unlike RAS latency, consistently decreases (bandwidth increases) with every memory
release, and in the new DDR3 memory, is very close to 0 ns. Figure 2.2 shows the column access
latency.
Figure 2.2 DRAM Column Address Time vs. Year
Looking at some prominent areas of the CAS graph, it can be seen in Figure 2.2 that bandwidth
greatly increased (CAS decreased) from 1983 to 1986. This is due to the switch from NMOS
DRAMs to CMOS DRAMs. In1996 the first SDRAM was released. The CAS latency decreased
(bandwidth increased) due to synchronization and banking. In later years, the CAS latency does
not decrease by much, but this is expected since the latency is already much smaller. Comparing
Figure 2.2 to Figure 2.1, CAS time decreases much more drastically than RAS time. This
means the bandwidth greatly improves, while latency improves much more slowly. In 2010,
when DDR2 was released, it can be seen that latency was sacrificed (Figure 2.1) for an
increase in bandwidth (Figure 2.2).

11
CHAPTER 3
METHODOLOGY
In this section the ML605 and Virtex-6 board hardware is described as well as the tools
utilized for design and validation. The Xilinx Integrated Software Environment (ISE) was used
for design and iSim and ChipScope were used for validation in simulation and in hardware.
3.1 HARDWARE
3.1.1 Virtex-6FPGA
The Virtex-6 FPGA (XC6VLX240T) is used to implement the arbiter. This FPGA has 241, 152
logic cells and is organized into banks (40 pins per bank). These logic cells, or slices, are
composed of four look-up tables (LUTs), multiplexers and arithmetic carry logic.
LUTs implement Boolean functions, and multiplexers enable combinatorial logic. Two slices
form a configurable logic block (CLB). In order to distribute a clock signal to all these logic
blocks, the FPGA has five types of clock lines: BUFG, BUFR, BUFIO, BUFH, and high-
performance clock. These lines satisfy “requirements of high fan out, short propagation delay,
and extremely low skew”. The clock lines are also split into categories depending on the sections
of the FPGA and components they drive. The three categories are: global, regional, and I/O lines.
Global clock lines drive all flip-flops, clock enables, and many logic inputs. Regional clock lines
drive all clock destinations in their region and two bordering regions. There are six to eighteen
regions in an FPGA. Finally, I/O clock lines are very fast and only drive I/O logic and
serializer/deserializer circuits.

12
3.1.2 ML605 Board
The Virtex-6 FPGA is included on the ML605 Development Board. In addition to the FPGA, the
development board includes a 512 MB DDR3 small outline dual inline memory module
(SODIMM), which our design arbitrates access to. A SODIMM is the type of board the memory
is manufactured on .The FPGA also includes 32 MB of linear BPI Flash and 8 Kb of IIC
EEPROM.
Communication mechanisms provided on the board include Ethernet, SFP transceiver
connector, GTX port, USB to UART Bridge, USB host and peripheral port, and PCI Express.
The only connection used during this project was the USB JTAG connector. It was used to
program and debug the FPGA from the host computer.
There are three clock sources on the board: a 200 MHz differential oscillator, 66 MHz single-
ended oscillator and SMA connectors for an external clock. This project utilizes the 200MHz
oscillator. Peripherals on the ML605 board were useful for debugging purposes. The push
buttons were used to trigger sections of code execution in ChipScope such as reading and
writing from memory. Dip switches acted as configuration inputs to our code. For example,
they acted as a safety to ensure the buttons on the board were not automatically set to active
when the code was downloaded to the board. In addition, the value on the switches indicated
which system would begin writing first for debugging purposes. LEDs were used to check
functionality of sections of code as well, and for additional validation, they can be used to
indicate if an error as occurred. Although we did not use it, the ML605 board provides an LCD.
3.2 TOOLS
Now that the hardware where the design is placed is described, the software used to
manipulate the design can be described. The tools for design include those provided within
Xilinx Integrated Software Environment, and the tools used for validation include iSim and
ChipScope. This looks at the turn-around time for both validation tools and what it means for the
design process.

13
3.2.1 Xilinx Integrated Software Environment (ISE)
We designed the arbiter using Verilog hardware description language in Xilinx Integrated
Software Environment (ISE). ISE is an environment in which the user can “take [their] design
from design entry through Xilinx device programming”. The main workbench for ISE is ISE
Project Navigator. The Project Navigator tool allows the user to effectively manage their
design and call upon development processes. In Figure 3.1, a screen shot of ISE Project
Navigator :
Figure 3.1 Screen Shot of ISE Project Navigator
Figure 3.1 shows some main windows in ISE Project Navigator. On the right hand side is the
window for code entry. The hierarchal view of modules in the design appears on the
left, and when implementation is selected from the top, the design implementation progress is
shown in the bottom window. If simulation were selected instead of implementation there
would be an option to run the design for simulation.
The main processes called upon by ISE are synthesis, implementation, and bit stream
generation. During synthesis, Xilinx Synthesis Technology (XST) is called upon. XST
synthesizes Verilog, VHDL or mixed language designs and creates netlist files. Netlist files, or
NGC files, contain the design logic and constraints.

14
They are saved for use in the implementation process. During synthesis, the XST checks for
synthesis errors (parsing) and infers macros from the code. When the XST infers macros it
recognizes parts of the code that can be replaced with components in its library such as MUXes,
RAM encodes them in a way that would be best for reduced area and/or increased speed.
Implementation is the longest process to perform on the design. The first step of
implementation is to combine the netlists and constraints into a design/NGD file. The NGD
file is the design file reduced to Xilinx primitives. This process is called translation. During the
second step, mapping, the design is fitted into the target device. This involves turning logic into
FPGA elements such as configurable logic blocks. Mapping produces a native circuit
description (NCD) file.
The third step, place and route, uses the mapped NCD file to place the design and route timing
constraints. Finally, the program file is generated and, at the finish of this step, a bit stream is
ready to be downloaded to the board.
3.2.2 Synthesis and Simulation
Once the design has been synthesized, simulation of the design is possible. Simulating a design
enables verification of logic functionality and timing. We used simulation tool in ISE (isim) to
view timing and signal values. In order to utilize isim, we created a test bench to provide the
design with stimulus. Since simulation only requires design synthesis, it is a relatively fast
process. The short turn-around time of simulation means we were able to iteratively test small
changes to the design and, therefore, debug our code efficiently.
3.2.3 Implementation and Hardware VALIDATION
Once the design was working in simulation, we still needed to test the design’s
functionality in hardware. Testing the design in hardware is the most reliable validation
method. In order to download the design to the board, it first needs to be implemented in ISE.

15
Implementation has a much longer turn- around time than synthesis, so while functionality in
hardware ensures the design is working, simulation is the practical choice for iterative
verification.
In order to test our design in hardware, we utilized ChipScope Pro Analyzer, a GUI which
allows the user to “configure [their] device, choose triggers, setup the console, and view results
of the capture on the fly”. In order to use ChipsScope Pro, you may either insert ChipScope
Pro Cores into the design using the Core Generator, a tool that can be accessed in ISE Project
Figure 3.2 Flow Chart and Timing for Simulation and Hardware Validation
Navigator, or utilize the Plan Ahead or Core Inserter tool, which automatically inserts cores into
the design netlist for you. One method of inserting ChipScope cores into the design is by utilizing
Plan Ahead software. The Plan Ahead tool enables the creation of floorplans.

16
Floorplans provide an initial view of “the design’s interconnect flow and logic module sizes.
This helps the designer to “avoid timing, utilization, and routing congestion issues. Plan Ahead
also allows the designer to create and configure I/O ports and analyze implementation results,
which aids in the discovery of bottlenecks in the design.
For our project, however, we utilized Plan Ahead only for its ability to automatically insert
ChipScope cores. Plan Ahead proved to be inefficient for our purposes since many times, when a
change was made in the design, the whole netlist would need to be selected again.
In addition, there were bugs in the software that greatly affected the turn-around time of
debugging, and it crashed several times. If Plan Ahead were used for floor planning and other
design tools, then it might have proved to be much for useful.
In replace of Plan Ahead, we utilized the Core Generator within ISE. The ChipScope
cores provided by Xilinx include ICON, ILA, VIO, ATC2, and IBERT. The designer can
choose which cores to insert by using the Core Generator in ISE. The ICON core provides
communication between the different cores and the computer running ChipScope. It can connect
up to fifteen ILA, VIO, and ATC2 cores.
The ILA core is used to synchronously monitor internal signals. It contains logic to trigger inputs
and outputs and capture data. ILA cores allow up to sixteen trigger ports, which can be 1 to
256 bits wide. The VIO core can monitor signals like ILA, but also drive internal FPGA signals
real-time. The ATC2 core is similar to the ILA core, but was created for Agilent FPGA
dynamic probe technology. Finally, the IBERT core contains “all the logic to control, monitor,
and change transceiver parameters and perform bit error ratio tests.
The only ChipScope cores we were concerned with in this project were the ICON and ILA cores
We inserted one ChipScope ILA and ICON cores using the ISE Core Generator within
ISE Project Navigator. The ILA core allowed us to monitor internal signals in the FPGA.
Instead of inserting a VIO core, which allows inputs to and outputs from ChipScope, we used
buttons to trigger the execution of write and read logic.

17
3.2.4 Analysis of Turn-Around Times
As introduced in sections 3.3.2 and 3.3.3, implementation takes much longer than synthesis.
Therefore, when it comes down to turn-around time, simulation is much more effective for
iterative debugging. In Figure 3.2, the phases for simulation and hardware validation can be
seen as well as the time it takes to complete each phase.
For simulation, the process starts at Verilog code, becomes synthesized logic, and using a test
bench, is run in iSim for viewing. This process takes about eight minute’s total. A system’s
simulation run-time is much longer than if it were running on hardware, but simulation is still
faster than hardware validation because it does not have to undergo implementation.
The bottleneck in our simulation process is the set up time for the DDR3 memory model which
accounts for most of the simulation time. Hardware validation starts at Verilog code, is
synthesized, implemented, and imported into ChipScope. This whole process takes about fifteen
minutes.
Most of the time spent for hardware validation is on implementation of the design. In addition,
hardware validation requires more of the user’s attention. It is more difficult and takes more
time to set up a ChipScope core than it does to create a test bench for simulation. While a
test bench (green) involves writing some simple code, a ChipScope core (orange) involves
setting up all the signals to be probed. Not only is simulation faster, but the iSim tool is easier
to use than ChipScope. Figure.3.3shows

18
Figure 3.3 iSim Screen Shot
The screen shot of iSim shows the instance names in the first column, all the signals to choose
from in the second, and the signals and their waveforms in the third and fourth columns. The
user can view any signal without having to port it out of the design and re-implement like
when using ChipScope. When adding an additional signal in iSim, only simulation needs to be
restarted. The iSim interface makes debugging much easier with collapsible signal viewing,
grouping abilities, and a large window for viewing many signals at once.
A screen shot of ChipScope is shown in Figure 3.4 In ChipScope, you can view the devices,
signals, triggers, and waveforms window.The time ChipScope is able to capture is much less
than iSim. For this reason, triggers are required to execute different parts of code; this is
where buttons were utilized. If a signal could not fit into the allowable number of signal inputs
or was forgotten, it would need to be added to the design and implemented all over again much
longer turn-around time than simulation. Therefore, simulation is used for iterative debugging and
functionality testing, while hardware validation is the next step to ensure design accuracy.

19
Figure 3.4 ChipScope Screen Shot
3.2.5 Xilinx Core Generator
One tool in ISE that was very important to our project was the CORE Generator. The core
generator provided us with not only the ChipScope cores, but the memory controller, and FIFOs
as well. The core generator can be accessed within ISE Project Navigator. It provides many
additional functions for the designer.
The options provided for creating FIFOs, for example, include common or independent clocks,
first-word fall-through; a variety of flags to indicated the amount of data in the FIFO and write
width, read width and depth.
The different width capabilities allowed us to create asynchronous FIFOs. The memory
controller was created using the Xilinx memory interface generator (MIG). There were options
to use an AXI4, native, or user interface, which is discussed in a following section on interfacing
with the Xilinx MIG.

20
CHAPTER 4
ARCHITECTURE
The SDR SDRAM Controller consists of four main modules: the SDRAM controller, control
interface, command, and data path modules. The SDRAM controller module is the top-level
module that instantiates the three lower modules and brings the whole design together. The
control interface module accepts commands and related memory addresses from the host,
decoding the command and passing the request to the command module. The command module
accepts commands and addresses from the control interface module, and generates the proper
commands to the SDRAM. The data path module handles the data path operations during
WRITEA and READA commands. The SDRAM controller module also instantiates a PLL that is
used in the CLOCK_LOCK mode to improve I/O timing. This PLL is not essential to the
operation of the SDR SDRAM Controller and can be easily removed.
Figure 4 Architecture of SDRAM controller

21
4.1 CONTROL INTERFACE MODULE
The control interface module decodes and registers commands from the host, and passes the
decoded NOP, WRITEA, READA, REFRESH, PRECHARGE, and LOAD_MODE commands,
and ADDR to the command module. The LOAD_REG1 and LOAD_REG2 commands are
decoded and used internally to load the REG1 and REG2 registers with values from ADDR.
Figure 4.1 shows the control interface module block diagram.
Figure 4.1 Control Interface Module

22
The control interface module also contains a 16-bit down counter and control circuit that is used
to generate periodic refresh commands to the command module. The 16-bit down counter is
loaded with the value from REG2 and counts down to zero. The REFRESH_REQ output is
asserted when the counter reaches zero and remains asserted until the command module
acknowledges the request. The acknowledge from the command module causes the down counter
to be reloaded with REG2 and the process repeats. REG2 is a 16-bit value that represents the
period between REFRESH commands that the SDR SDRAM Controller issues. The value is set
by the equation int (refresh_period/clock_period).
For example, if an SDRAM device that is connected to the SDR SDRAM Controller has a 64-ms,
4096-cycle refresh requirement, the device must have a REFRESH command issued to it at least
every64 ms/4096 = 15.625 µs. If the SDRAM and SDR SDRAM Controller are clocked by a
100-MHz clock, the maximum value of REG2 is 15.625 µs/0.01µs = 1562d.
4.2 COMMAND MODULE
The command module accepts decoded commands from the control interface module, refresh
requests from the refresh control logic, and generates the appropriate commands to the SDRAM.
The module contains a simple arbiter that arbitrates between the commands from the host
interface and the refresh requests from the refresh control logic. The refresh requests from the
refresh control logic have priority over the commands from the host interface. If a command from
the host arrives at the same time or during a hidden refresh operation, the arbiter holds off the
host by not asserting CMDACKuntil the hidden refresh operation is complete. If a hidden refresh
command is received while a host operation is in progress, the hidden refresh is held off until the
host operation is complete. Figure 4.2 shows the command module block diagram.

23
Figure 4.2 Command Module Block Diagram
After the arbiter has accepted a command from the host, the command is passed onto the
command generator portion of the command module. The command module uses three shift
registers to generate the appropriate timing between the commands that are issued to the
SDRAM. One shift register is used to control the timing the ACTIVATE command; a second is
used to control the positioning of the READA or WRITEA commands; a third is used to time
command durations, which allows the arbiter to determine if the last requested operation has been
completed.
The command module also performs the multiplexing of the address to the SDRAM. The row
portion of the address is multiplexed out to the SDRAM outputs A[11:0] during the
ACTIVATE(RAS) command. The column portion is then multiplexed out to the SDRAM address
outputs during a READA (CAS) or WRITEA command.
The output signal OEis generated by the command module to control tristate buffers in the last
stage of the DATAIN path in the data path module.

24
4.3 DATA PATH MODULE
The data path module provides the SDRAM data interface to the host. Host data is accepted on
DATAINfor WRITEA commands and data is provided to the host on DATAOUTduring READA
commands.
Figure 4.3 shows the data path module block diagram.
Figure 4.3 Data Path Module
The DATAINpath consists of a 2-stage pipeline to align data properly relative to the CMDACK
and the commands that are issued to the SDRAM. DATAOUTconsists of a 2-stage pipeline that
registers data from the SDRAM during a READA command. DATAOUTpipeline delay can be
reduced to one or even zero registers, with the only affect that the relationship of DATAOUTto
CMDACKchanges.

25
CHAPTER 5
OPERATION
The single data rate (SDR) synchronous dynamic random access memory (SDRAM) controller
provides a simplified interface to industry standard SDR SDRAM. The SDR SDRAM Controller
is available in either Verilog HDL or VHDL and is optimized for the architecture. The SDR
SDRAM Controller supports the following features:
 Burst lengths of 1, 2, 4, or 8 data words.
 CAS latency of 2 or 3 clock cycles.
 16-bit programmable refresh counter used for automatic refresh.
 2-chip selects for SDRAM devices.
 Supports the NOP, READA, WRITEA, AUTO_REFRESH, PRECHARGE, ACTIVATE,
BURST_STOP, and LOAD_MR commands.
 Support for full-page mode operation.
 Data mask line for write operations.
 PLL to increase system performance.
Figure 5 SDR SDRAM Controller System-Level Diagram

26
5.1 SDRAM OVERVIEW
SDRAM is high-speed dynamic random access memory (DRAM) with a synchronous interface.
The synchronous interface and fully-pipelined internal architecture of SDRAM allows extremely
fast data rates if used efficiently. Internally, SDRAM devices are organized in banks of memory,
which are addressed by row and column. The number of row- and column-address bits and the
number of banks depends on the size of the memory.
SDRAM is controlled by bus commands that are formed using combinations of the RASN, CASN,
and WENsignals. For instance, on a clock cycle where all three signals are high, the associated
command is a no operation (NOP). A NOP is also indicated when the chip select is not asserted.
Table 5.1 shows the standard SDRAM bus commands.
Table 5.1 SDRAM Bus Commands
SDRAM banks must be opened before a range of addresses can be written to or read from. The
row and bank to be opened are registered coincident with the ACT command.
When a bank is accessed for a read or a write it may be necessary to close the bank and re-open it
if the row to be accessed is different than the row that is currently opened.
Closing a bank is done with the PCH command.

27
The primary commands used to access SDRAM are RD and WR. When the WR command is
issued, the initial column address and data word is registered. When a RD command is issued, the
initial address is registered. The initial data appears on the data bus 1 to 3 clock cycles later.
This is known as CAS latency and is due to the time required to physically read the internal DRAM
core and register the data on the bus. The CAS latency depends on the speed of the SDRAM and
the frequency of the memory clock. In general, the faster the clock, the more cycles of CAS latency
are required. After the initial RD or WR command, sequential read and writes continue until the
burst length is reached or a BT command is issued. SDRAM memory devices support burst lengths
of 1, 2, 4, or 8 data cycles. The ARF is issued periodically to ensure data retention. This function is
performed by the SDR SDRAM Controller and is transparent to the user.
The LMR is used to configure the SDRAM mode register which stores the CAS latency, burst
length, burst type, and write burst mode. Consult the SDRAM specification for additional details.
SDRAM comes in dual in-line memory modules (DIMMs), small-outline DIMMs (SO-DIMMs)
and chips. To reduce pin count SDRAM row and column addresses are multiplexed on the same
pins. SDRAM often includes more than one bank of memory internally and DIMMS may require
multiple chip selects.
5.2 FUNCTIONAL DESCRIPTION
Table shows the SDR SDRAM Controller interface signals. All signals are synchronous to the
system clock and outputs are registered at the SDR SDRAM Controller’s outputs.

28
Table 5.2 Interface Signals
5.3 SDRAM CONTROLLER COMMAND INTERFACE
The SDR SDRAM Controller provides a synchronous command interface to the SDRAM and
several control registers. Table shows the commands, which are described in following sections.
The following rules apply to the commands with reference with table 5.2:
 All commands, except NOP, are driven by the user ontoCMD [2:0]; ADDR and DATAIN
are set appropriately for the requested command. The controller registers the command on
the next rising clock edge.

29
 To acknowledge the command the controller asserts CMDACKfor one clock period.
 For READA or WRITEA commands, the user should start receiving or writing data on
DATAOUTand DATAIN.
 The user must drive NOP onto CMD [2:0] by the next rising clock edge after CMDACKis
asserted.
Table 5.3 Interface Commands
5.3.1 NOP Command
NOP is a no operation command to the controller. When NOP is detected by the controller, it
performs a NOP in the following clock cycle. A NOP must be issued the following clock cycle
after the controller has acknowledged a command.
The NOP command has no affect on SDRAM accesses that are already in progress.

30
5.3.2 READA Command
Figure 5.1 Timing diagram for a READA command
The READA command instructs the SDR SDRAM Controller to perform a burst read with auto-
precharge to the SDRAM at the memory address specified by ADDR. The SDR SDRAM
Controller issues an ACTIVATE command to the SDRAM followed by a READA command. The
read burst data first appears on DATAOUT(RCD + CL + 2) after the SDR SDRAM Controller
asserts CMDACK. During a READA command the user must keep DMlow.
When the controller is configured for full-page mode, the READA command becomes READ
(READ without auto-precharge). Figure 5.1 shows an example timing diagram for a READA
command.

31
The following sequence describes the general operation of the READA command:
 The user asserts READA, ADDRand DM.
 The SDR SDRAM Controller asserts CMDACK to acknowledge the command and
simultaneously starts issuing commands to the SDRAM devices.
 One clock after CMDACKis asserted, the user must assert NOP.
 The CMDACKpresents the first read burst value on DATAOUT, the remainder of the read
bursts follow every clock cycle.
5.3.3 WRITEA Command
Figure 5.2 Timing diagram for a WRITEA command
The WRITEA command instructs the SDR SDRAM Controller to perform a burst write with auto-
precharge to the SDRAM at the memory address specified by ADDR.

32
The SDR SDRAM Controller will issue an ACTIVATE command to the SDRAM followed by a
WRITEA command. The first data value in the burst sequence must be presented with the
WRITEA and ADDR address. The host must start clocking data along with the desired DMvalues
into the SDR SDRAM Controller (tRCD – 2) clocks after the SDR SDRAM Controller has
acknowledged the WRITEAcommand.
See a SDRAM data sheet for how to use the data mask lines DM/DQM.When the SDR SDRAM
Controller is in the full-page mode WRITEA becomes WRITE (write without auto-precharge).
Figure shows an example timing diagram for a WRITEA command. The following sequence
describes the general operation of a WRITEA command:
 The user asserts WRITEA, ADDR, the first write data value on DATAIN, and the desired
data mask value on DM with reference to the table 5.2 and 5.3.
 One clock after CMDACKwas asserted, the user asserts NOP on CMD.
 The user clocks data and data mask values into the SDR SDRAM Controller through
DATAIN and DM.
5.3.4 REFRESH Command
The REFRESH command instructs the SDR SDRAM Controller to perform an ARF command to
the SDRAM. The SDR SDRAM Controller acknowledges the REFRESH command with
CMDACK. Figure 5.3 shows an example timing diagram of the REFRESH command.

33
Figure 5.3 Timing diagram for a REFRESH command
The following sequence describes the general operation of a REFRESH command:
 The user asserts REFRESH on the CMDinput.
 The user asserts NOP on CMD

34
5.3.5 PRECHARGE Command
Figure 5.4 Timing diagram for a PRECHARGE command
The PRECHARGE command instructs the SDR SDRAM Controller to perform a PCH command
to the SDRAM. The SDR SDRAM Controller acknowledges the command with CMDACK. The
PCH command is also used to generate a burst stop to the SDRAM. Using PRECHARGE to
terminate a burst is only supported in the full-page mode.
Note that the SDR SDRAM Controller adds a latency from when the host issues a command to
when the SDRAM sees the PRECHARGE command of 4 clocks. If a full-page read burst is to be
stopped after 100 cycles, the PRECHARGE command must be asserted (4 + CL – 1) clocks before
the desired end of the burst (CL – 1 requirement is imposed by the SDRAM devices). So if the
CAS latency is 3, the PRECHARGE command must be issued (100 – 3 –1 – 4) = 92 clocks into the
burst.

35
Figure 5.4 shows an example timing diagram of the PRECHARGE command. The following
sequence describes the general operation of a PRECHARGE command:
 The user asserts PRECHARGE on CMD.
 The DR SDRAM Controller asserts CMDACK to acknowledge the command and
 The user asserts NOP on CMD
5.3.6 LOAD_MODE Command
The LOAD_MODE command instructs the SDR SDRAM Controller to perform a LMR command
to the SDRAM. The value that is to be written into the SDRAM mode register must be present on
ADDR [11:0]with the LOAD_MODE command. The value on ADDR [11:0]is mapped directly to
the SDRAM pins A11-A0 when the SDR SDRAM Controller issues the LMR to the SDRAM.
Figure 5.5 shows an example timing diagram.
 The following sequence describes the general operation of a LOAD_MODE command, the
users asserts LOAD_MODE on CMD.
 One clock after the SDR SDRAM Controller asserts CMDACK, the users asserts NOP on
CMD.

36
. Figure 5.5 Timing diagram for a LOAD_MODE Command
5.3.7 LOAD_REG1 Command
Table 5.4 REG1 Bit Definitions

37
CL is the CAS latency of the SDRAM memory in clock periods and is dependent on the memory
device speed grade and clock frequency. Consult the SDRAM data sheet for appropriate settings.
CL must be set to the same value as CL for the SDRAM memory devices.
RCD is the RAS to CAS delay in clock periods and is dependent on the SDRAM speed grade and
clock frequency. RCD = INT(tRCD/clock period), where tRCD is the value from the SDRAM data
sheet and clock period is the clock period of the SDR SDRAM Controller and SDRAM
clock.RRD is the refresh to RAS delay in clock periods. RRD is dependent on the SDRAM speed
grade and clock frequency. RRD= INT(tRRD/clock_period), where tRRD is the value from the
SDRAM data sheet and clock_period is the clock period of the SDR SDRAM controller and
SDRAM clock.PM is the page mode bit. If PM = 0, the SDR SDRAM Controller operates in non-
page mode. If PM = 1, the SDR SDRAM Controller operates in page-mode. See Section “Full-
Page Mode Operation” for more information. BL is the burst length the SDRAM devices have
been configured for.
5.3.8 LOAD_REG2 Command
The LOAD_REG2 command instructs the SDR SDRAM Controller to load the internal configuration
register REG2. REG2 is a 16-bit value that represents the period between REFRESH commands that the
SDR SDRAM Controller issues. The value is set by the equation int (refresh_period/clock period).
For example, if a SDRAM device connected to the SDR SDRAM Controller has a 64-ms, 4096-
cycle refresh requirement the device must have a REFRESH command issued to it at least every 64
ms/4096 = 15.625 09 µs.If the SDRAM and SDR SDRAM Controller are clocked by a 100 MHz
clock, the maximum value of REG2 is 15.625 µs/0.01 µs = 1562d. The value that is to be written
into REG2 must be presented on the ADDR input simultaneously with the assertion of the
command LOAD_REG2.

38
CHAPTER 6
ELEMENTS OF MEMORY BANK
6.1 DECODER
A decoder is a device which does the reverse operation of an encoder, undoing the encoding so
that the original information can be retrieved. The same method used to encode is usually just
reversed in order to decode. It is a combinational circuit that converts binary information from n
input lines to a maximum of 2n
unique output lines.
6.1.1 A 2-to-4 line single-bit decoder
In digital electronics, a decoder can take the form of a multiple-input, multiple-output logic
circuit that converts coded inputs into coded outputs, where the input and output codes are
different. E.g. n-to-2n
, binary-coded decimal decoders. Enable inputs must be on for the decoder
to function, otherwise its outputs assume a single "disabled" output code word. Decoding is
necessary in applications such as data multiplexing, 7 segment display and memory address
decoding.
The example decoder circuit would be an AND gate because the output of an AND gate is "High"
(1) only when all its inputs are "High." Such output is called as "active High output". If instead of
AND gate, the NAND gate is connected the output will be "Low" (0) only when all its inputs are
"High". Such output is called as "active low output". A slightly more complex decoder would be
the n-to-2n
type binary decoders. These type of decoders are combinational circuits that convert
binary information from 'n' coded inputs to a maximum of 2n
unique outputs.
We say a maximum of 2n
outputs because in case the 'n' bit coded information has
unused bit combinations, the decoder may have less than 2n
outputs.

39
We can have 2-to-4 decoder, 3-to-8 decoder or 4-to-16 decoder. We can form a 3-to-8 decoder
from two 2-to-4 decoders (with enable signals).
Figure 6.1 RTL of decoder
Similarly, we can also form a 4-to-16 decoder by combining two 3-to-8 decoders. In this type of
circuit design, the enable inputs of both 3-to-8 decoders originate from a 4th input, which acts as
a selector between the two 3-to-8 decoders. This allows the 4th input to enable either the top or
bottom decoder, which produces outputs of D(0) through D(7) for the first decoder, and D(8)
through D(15) for the second decoder.
Figure 6.2 Simulation Of Decoder

40
A decoder that contains enable inputs is also known as a decoder-demultiplexer. Thus, we have a
4-to-16 decoder produced by adding a 4th input shared among both decoders, producing 16
outputs.
6.2 DEMUX
The data distributor, known more commonly as a demultiplexer or “Demux” for short, is the
exact opposite of the Multiplexer we saw in the previous tutorial. The demultiplexer converts a
serial data signal at the input to a parallel data.
Figure 6.3 RTL Of DEMUX

41
The demultiplexer takes one single input data line and then switches it to any one of a number of
individual at its output lines output lines one at a time.
Figure 6.4 Simulation Of DEMUX
6.3 RAM
Random-access memory (RAM) is a form of computer data storage. A random-access memory
device allows data items to be read and written in roughly the same amount of time regardless of
the order in which data items are accessed. In contrast, with other direct-access data storage
media such as hard disks, CD-RWs, DVD-RWs and the older drum memory, the time required to
read and write data items varies significantly depending on their physical locations on the
recording medium, due to mechanical limitations such as media rotation speeds and arm
movement delays.
Today, random-access memory takes the form of integrated circuits. Strictly speaking, modern
types of DRAM are not random access, as data is read in bursts, although the name DRAM /
RAM has stuck. However, many types of SRAM are still random access even in a strict sense.

42
RAM is normally associated with volatile types of memory (such as DRAM memory modules),
where stored information is lost if the power is removed, although many efforts have been made
to develop non-volatile RAM chips. Other types of non-volatile memory exist that allow random
access for read operations, but either do not allow write operations or have limitations on them.
These include most types of ROM and a type of flash memory called NOR-Flash.
6.3.1 TYPES OF RAM
The two main forms of modern RAM are Static Ram (SRAM), dynamic RAM (DRAM). In
SRAM, a bit of data is stored using the state of a flip-flop. This form of RAM is more expensive
to produce, but is generally faster and requires less power than DRAM and, in modern
computers, is often used as cache memory for the CPU. DRAM stores a bit of data using a
transistor and capacitor pair, which together comprise a memory cell. The capacitor holds a high
or low charge (1 or 0, respectively), and the transistor acts as a switch that lets the control
circuitry on the chip read the capacitor's state of charge or change it. As this form of memory is
less expensive to produce than static RAM, it is the predominant form of computer memory used
in modern computers.
Figure 6.5 RTL of RAM

43
Both static and dynamic RAM are considered volatile, as their state is lost or reset when power is
removed from the system. By contrast, read-only memory (ROM) stores data by permanently
enabling or disabling selected transistors, such that the memory cannot be altered. Writeable
variants of ROM (such as EEPROM and flash memory) share properties of both ROM and RAM,
enabling data to persist without power and to be updated without requiring special equipment.
These persistent forms of semiconductor ROM include USB flash drives, memory cards for
cameras and portable devices, etc. ECC memory (which can be either SRAM or DRAM) includes
special circuitry to detect and/or correct random faults (memory errors) in the stored data,
using parity bits or error correction code.
In general, the term RAM refers solely to solid-state memory devices (either DRAM or SRAM),
and more specifically the main memory in most computers. In optical storage, the term DVD-
RAM is somewhat of a misnomer since, unlike CD-RW or DVD-RW it does not need to be
erased before reuse. Nevertheless a DVD-RAM behaves much like a hard disc drive if somewhat
slower.
Figure 6.6 Simulation of RAM

44
6.4 MUX
In electronics, a multiplexer is a device that selects one of several analog or digital input signals
and forwards the selected input into a single line. A multiplexer of 2n
inputs has n select lines,
which are used to select which input line to send to the output. Multiplexers are mainly used to
increase the amount of data that can be sent over the network within a certain amount of time
and bandwidth. A multiplexer is also called a data selector.
Figure 6.7 RTL of MUX
An electronic multiplexer can be considered as a multiple-input, single-output switch, and a
demultiplexer as a single-input, multiple-output switch. The schematic symbol for a multiplexer
is an isosceles trapezoid with the longer parallel side containing the input pins and the short
parallel side containing the output pin.

45
The schematic on the right shows a 2-to-1 multiplexer on the left and an equivalent switch on the
right. The wire connects the desired input to the output. An electronic multiplexer makes it
possible for several signals to share one device or resource, for example one A/D converter or
one communication line, instead of having one device per input signal.
Figure 6.8 Simulation Of MUX
6.5 BUFFER
A buffer amplifier (sometimes simply called a buffer) is one that provides electrical impedance
transformation from one circuit to another. Two main types of buffer exist: the voltage buffer and
the current buffer

46
6.5.1 VOLTAGE BUFFER
A voltage buffer amplifier is used to transfer a voltage from a first circuit, having a high output
impedance level, to a second circuit with a low input impedance level. The interposed buffer
amplifier prevents the second circuit from loading the first circuit unacceptably and interfering
with its desired operation. In the ideal voltage buffer in the diagram, the input resistance is
infinite, the output resistance zero (impedance of an ideal voltage source is zero). Other
properties of the ideal buffer are: perfect linearity, regardless of signal amplitudes; and instant
output response, regardless of the speed of the input signal.
If the voltage is transferred unchanged (the voltage gain Av is 1), the amplifier is a unity gain
buffer; also known as a voltage follower because the output voltage follows or tracks the input
voltage. Although the voltage gain of a voltage buffer amplifier may be (approximately) unity, it
usually provides considerable current gain and thus power gain. However, it is commonplace to
say that it has a gain of 1 (or the equivalent 0 dB), referring to the voltage gain.
As an example, consider a Thévenin source (voltage VA, series resistance RA) driving a resistor
load RL. Because of voltage division (also referred to as "loading") the voltage across the load is
only VA RL / ( RL + RA ). However, if the Thévenin source drives a unity gain buffer such as that
in Figure 1 (top, with unity gain), the voltage input to the amplifier is VA, and with no voltage
division because the amplifier input resistance is infinite. At the output the dependent voltage
source delivers voltage Av VA = VA to the load, again without voltage division because the output
resistance of the buffer is zero. A Thévenin equivalent circuit of the combined original Thévenin
source and the buffer is an ideal voltage source VA with zero Thévenin resistance.
Figure 6.9 RTL Of Buffer

47
6.5.2 CURRENT BUFFER
Typically a current buffer amplifier is used to transfer a current from a first circuit, having a
low output impedance level, to a second circuit with a high input impedance level. The interposed
buffer amplifier prevents the second circuit from loading the first circuit unacceptably and
interfering with its desired operation.
In the ideal current buffer in the diagram, the input impedance is zero and the output impedance
is infinite (impedance of an ideal current source is infinite). Again, other properties of the ideal
buffer are: perfect linearity, regardless of signal amplitudes; and instant output response,
regardless of the speed of the input signal.
For a current buffer, if the current is transferred unchanged (the current gain βi is 1), the amplifier
is again a unity gain buffer; this time known as a current follower because the output
current follows or tracks the input current.
Figure 6.10- Simulation Of Buffer

48
As an example, consider a Norton source (current IA, parallel resistance RA) driving a resistor
load RL. Because of current division (also referred to as "loading") the current delivered to the
load is only IA RA / ( RL + RA ). However, if the Norton source drives a unity gain buffer (bottom,
with unity gain), the current input to the amplifier is IA, with no current division because the
amplifier input resistance is zero. At the output the dependent current source delivers current
βi IA = IA to the load, again without current division because the output resistance of the buffer is
infinite. A Norton equivalent circuit of the combined original Norton source and the buffer is an
ideal current source IA with infinite Norton resistance.
6.6 MEMORY BANK
A memory bank is a logical unit of storage in electronics, which is hardware dependent. In
a computer the memory bank may be determined by the memory access controller along with
physical organization of the hardware memory slots.
In a typical synchronous dynamic random-access memory (SDRAM) or double data rate
synchronous dynamic random-access memory (DDR SDRAM), a bank consists of multiple rows
and columns of storage units and is usually spread out across several chips. In a single read or
write operation, only one bank is accessed, therefore bits in a column or a row, per bank, per chip
= memory bus width in bits (single channel). The size of a bank is further determined by bits in a
column and a row, per chip× number of chips in a bank.

49
Figure 6.11 RTL Of Memory Bank
Some computers have several identical memory banks of RAM, and use bank switching to switch
between them. Harvard architecture computers have (at least) 2 very different banks of memory,
one for program storage and one for data storage.

50
Figure 6.12 Simulation Of Memory Bank

51
CHAPTER 7
RESULTS AND CONCLUSIONS
7.1 POWER CONSUMED WHEN ALL 8 BANKS ARE ON
7.1.1 Project
Table 7.1 Project
7.1.2 Device
Table 7.2 Device

52
7.1.3 Environment
Table 7.3 Environment
7.1.4 Default Activity
Table 7.4 Default Activity

53
7.1.5 On-Chip Power Summary
Table 7.5 On-Chip Power Summary
7.1.6 Thermal Summary
Table 7.6 Thermal Summary
7.1.7 Power Supply Summary
Table 7.7 Power Supply Summary

54
Table 7.8 Power Supply Current
7.1.8 Confidence Level
Table 7.9 Confidence Level

55
7.1.9 By Hierarchy
Table 7.10 By Hierarchy

56
7.2 POWER CONSUMED WHEN ONLY ONE MEMORY BANK IS IN USE
7.2.1. Project
Table 7.11 Project
7.2.2 Device
Table 7.12 Device

57
7.2.3 Environment
Table 7.13 Environment
7.2.4 Default Activity Rates
Table 7.14 Default Activity

58
7.2.5 On-Chip Power Summary
Table 7.15 On-Chip Power Summary
7.2.6 Thermal Summary
Table 7.16 Thermal Summary
7.2.7 Power Supply Summary
Table 7.17 Power Supply Summary

59
Table 7.18 Power Supply Current
7.2.8 Confidence Level
Table 7.19 Confidence Level

60
7.2.9 By Hierarchy
Table 7.20 By Hierarchy
7.3 CONCLUSION
This project addresses the problem of finding a memory map for firm real-time workloads in
the context of SDRAM memory controllers. Existing controllers use either a static memory map
or provide only limited configurability. We use the number of banks requests are interleaved over
as flexible configuration parameter, while previous work considers it a fixed part of the controller
architecture. We use this degree of freedom to optimize the memory configuration to the mix of
applications and their requirements. This is beneficial for the worst-case performance in terms of
bandwidth, latency and power.

61
CHAPTER 8
FUTURE SCOPE
The advantages of this controller compared to SDR SDRAM, DDR1 SDRAM and DDR2
SDRAM is that it synchronizes the data transfer, and the data transfer is twice as fast as previous,
the production cost is also very low.
We have successfully designed using Verilog HDL and synthesized using Xilinx tool.
1. DDR4 SDRAM is the 4th generation of DDR SDRAM.
2. DDR3 SDRAM improves on DDR SDRAM by using differential signalling and lower voltages
to support significant performance advantages over DDR SDRAM.
3. DDR3 SDRAM standards are still being developed and improved.

62
REFERENCES
[1] C. van Berkel, “Multi-core for Mobile Phones,” in Proc. DATE, 2009.
[2] “International Technology Roadmap for Semiconductors (ITRS),” 2009.
[3] P. Kollig et al., “Heterogeneous Multi-Core Platform for Consumer Multimedia
Applications,” in Proc. DATE, 2009.
[4] L. Steffens et al., “Real-Time Analysis for Memory Access in Media Processing SoCs : A
Practical Approach,” Proc. ECRTS, 2008.
[5] S. Bayliss et al., “Methodology for designing statically scheduled application-specific
SDRAM controllers using constrained local search, “in Proc. FPT, 2009.
[6] B. Akesson et al., “Architectures and modelling of predictable memory controllers for
improved system integration,” in Proc. DATE, 2011.
[7] J. Reineke et al., “PRET DRAM Controller: Bank Privatization for Predictability and
Temporal Isolation,” in Proc. CODES+ISSS, 2011.
[8] M. Paolieri et al., “An Analyzable Memory Controller for Hard Real-Time CMPs,”
Embedded Systems Letters, IEEE, vol. 1, no. 4, 2009.
[9] Micron Technology Inc., “DDR3-800-1Gb SDRAM Datasheet, 02/10 EN edition,” 2006.
[10] D. Stiliadis et al., “Latency-rate servers: a general model for analysis of traffic scheduling
algorithms,” IEEE/ACM Trans. Netw., 1998. [11] B. Akesson et al., “Classification and Analysis
of Predictable Memory Patterns,” in Proc.RTCSA, 2010.
[12] DDR2 SDRAM Specification, JESD79-2E ed., JEDEC Solid State Technology
Association, 2008.
[13] DDR3 SDRAM Specification, JESD79-3D ed., JEDEC Solid State Technology
Association, 2009.

63
[14] K. Chandrasekar et al., “Improved Power Modelling of DDR SDRAMs,” in Proc. DSD,
2011.
[15] B. Akesson et al., “Automatic Generation of Efficient Predictable Memory Patterns,” in
Proc. RTCSA, 2011.

Memory map selection of real time sdram controller using verilog full project report

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Memory map selection of real time sdram controller using verilog full project report

Similaire à Memory map selection of real time sdram controller using verilog full project report (20)

Plus de rahul kumar verma

Plus de rahul kumar verma (11)

Dernier

Dernier (20)

Memory map selection of real time sdram controller using verilog full project report