SlideShare une entreprise Scribd logo
1  sur  73
Télécharger pour lire hors ligne
Project Report On
Memory map selection of real time SDRAM
controller using Verilog
By
RAHUL VERMA
(9015694258)
vi
TABLE OF CONTENTS Page
DECLARATION ............................................................................................................................ii
CERTIFICATE .............................................................................................................................iii
ACKNOWLEDGEMENTS ..........................................................................................................iv
ABSTRACT ..................................................................................................................................vi
LIST OF FIGURES .....................................................................................................................vii
LIST OF TABLES........................................................................................................................viii
LIST OF ABBREVIATION……………………………………………………………………...ix
CHAPTER 1 (INTRODUCTION)……………………………………………………………..01
1.1 LITERATURE SURVEY……………………………………………………...02
1.2 GOAL OF THE PROJECT…………………………………………………….03
CHAPTER 2 (BACKGROUND)………………………………………………………………04
2.1 RANDOM ACCESS MEMORY…………………………………........ ……..04
2.2 STATIC RANDOM ACCESS MEMORY …………………………………....04
2.3 DYNAMIC RANDOM ACCESS MEMORY ……………..………………….05
2.4 DEVELOPMENT OF DRAM ………………………………………………...06
2.4.1 DRAM …………………………………………………………………...07
2.4.2 SYNCHRONOUS DRAM……………………………………………….07
2.4.3 DDR1SDRAM…………………………………………………………....08
2.4.4 DDR2SDRAM……………………………………………………………08
vii
2.4.5 DDR3SDRAM………………………………………………………..…09
2.5 TIMELINE……………………………………………………………………09
CHAPTER 3 (METHODOLOGY)…………………………………………………………...11
3.1 HARDWARE…………………………………………………………………11
3.1.1 VIRTEX-6FPGA………………………………………………………..11
3.1.2 ML605 BOARD………………………………………………………...12
3.2 TOOLS………………………………………………………………………..12
3.2.1 XILINX INTERGRATED SOFTWARE ENVIRONMENT(ISE)……..13
3.2.2 SYNTHESIS AND SIMULATION……………………. ……………..14
3.2.3 IMPLEMENTATION AND HARDWARE VALIDATION…………...14
3.2.4 ANALYSIS OF TURN-AROUND TIMES…………………………….17
3.2.5 XILINX CORE GENERATOR…………………………………………19
CHAPTER 4 (ARCHITECTURE)……………………………………………………………20
4.1 CONTROL INTERFACE MODULE…………………………………………21
4.2 COMMAND MODULE…………………….……………….………………...22
4.3 DAPATH MODULE…………………………………………………………..24
CHAPTER 5 (OPERATION).....................................................................................................25
5.1 SDRAM OVERVIEW…………………………………………………………26
5.2 FUNCTIONAL DESCRIPTION………………………………………………27
5.3 SDRAM CONTROLLER COMMAND INTERFACE……………………….28
5.3.1 NOP COMMAND……………………………………………………….29
5.3.2 READA COMMAND…………………………………………………...30
5.3.3 WRITEA COMMAND……………………………………………….…31
5.3.4 REFRESH COMMAND…………………………………………….…..32
viii
5.3.5 PRECHARGE COMMAND………………………………………….....34
5.3.6 LOAD_MODE COMMAND……………………………………………35
5.3.7 LOAD_REG1 COMMAND……………………………………………..36
5.3.8 LOAD_REG2 COMMAND……………………………………………..37
CHAPTER 6 (ELEMENTS OF MEMORY BANK)…………………………………………38
6.1 DECODER…………………………………………………………………….38
6.1.1 A 2 TO 4 SINGLE BIT DECODER…………………………………….38
6.2 DEMUX………………………………………………………………………..40
6.3 RAM…………………………………………………………………………...41
6.3.1 TYPES OF RAM………………………………………………………...42
6.4 MUX…………………………………………………………………………...44
6.5 BUFFER……………………………………………………………………….45
6.5.1 VOLTAGE BUFFER…………………………………………………….46
6.5.2 CURRENT BUFFER…………………………………………………….47
6.6 MEMORY BANK……………………………………………………………..48
CHAPTER 7 (RESULT AND CONCLUSIONS)……………………………………………..51
7.1 POWER CONSUMED WHEN ALL 8 BANKS ARE ON…………..………51
7.1.1 PROJECT………………………………………………………...………51
7.1.2 DEVICE ………………………………………………………………….51
7.1.3 ENVIRONMENT …………………………………………………….,,,,.52
7.1.4 DEFAULT ACTIVITY………...………………………………….……..52
7.1.5 ON-CHIP POWER SUMMARY………………………………………...53
7.1.6 THERMAL SUMMARY………………………………………………...53
7.1.7 POWER SUPPLY SUMMARY………………………………………….53
7.1.8 CONFIDENCE LEVEL………………………………………………….54
7.1.9 BY HIERARCHY………………………………………………………..55
ix
7.2 POWER CONSUMED WHEN ONLY ONE MEMORY BANK IS IN USE…..56
7.2.1. PROJECT………………………………………………………………..56
7.2.2 DEVICE……………………………………………………………….....56
7.2.3 ENVIRONMENT………………………………………………………...57
7.2.4 DEFAULT ACTIVITY RATES…………………………………………57
7.2.5 ON-CHIP POWER SUMMARY………………………………………...58
7.2.6 THERMAL SUMMARY………………………………………………...58
7.2.7 POWER SUPPLY SUMMARY……………………………………...….58
7.2.8 CONFIDENCE LEVEL………………………………………………….59
7.2.9 BY HIERARCHY………………………………………………………..60
7.3 CONCLUSION…………………………………………………………….….60
CHAPTER 8 (FUTURE SCOPE)……………………………………………………………...61
REFERENCES...............................................................................................................................62
x
LIST OF FIGURES Page
Figure 2.1 DRAM Row Access Latency vs. Year 09
Figure 2.2 DRAM Column Address Time vs. Year 10
Figure 3.1 Screenshot of ISE Project Navigator 13
Figure 3.2 Flow Chart and Timing for Simulation and Hardware Validation 15
Figure 3.3 ISim Screen Shot 18
Figure 3.4 CHIPSCOPE Screen Shot 19
Figure 4.0 Architecture of SDRAM controller 20
Figure 4.1 Control Interface Module 21
Figure 4.2 Command Module Block Diagram 23
Figure 4.3 Data Path Module 24
Figure 5.0 SDR SDRAM Controller System-Level Diagram 25
Figure 5.1 Timing diagram for a READA command 30
Figure 5.2 Timing diagram for a WRITEA command 31
Figure 5.3 Timing diagram for a REFRESH command 32
Figure 5.4Timing diagram for a PRECHARGE command 34
Figure 5.5 Timing diagram for a PRECHARGE command 35
Figure 6.1 RTL of decoder 39
xi
Figure 6.2 Simulation of Decoder 40
Figure 6.3 RTL of DEMUX 41
Figure 6.4 Simulation Of DEMUX 42
Figure 6.5 RTL of RAM 44
Figure 6.6 Simulation of RAM 44
Figure 6.7 RTL of MUX 46
Figure 6.8 Simulation of MUX 46
Figure 6.9 RTL of Buffer 48
Figure 6.10 Simulation of Buffer 49
Figure 6.11 RTL of Memory Bank 50
Figure 6.12 Simulation of Memory Bank 50
xii
LIST OF TABLES Page
Table 5.1 SDRAM Bus Commands 26
Table 5.2 Interface Signals 28
Table 5.3 Interface Commands 29
Table 5.4 REG1 Bit Definitions 36
Table 7.1 Project 51
Table 7.2 Device 51
Table 7.3 Environment 52
Table 7.4 Default Activity 52
Table 7.5 On-Chip Power Summary 53
Table 7.6 Thermal Summary 53
Table 7.7 Power Supply Summary 53
Table 7.8 Power Supply Current 54
Table 7.9 Confidence Level 54
Table 7.10 By Hierarchy 55
Table 7.11 Project 56
Table 7.12 Device 56
xiii
Table 7.13 Environment 57
Table 7.14 Default Activity 57
Table 7.15 On-Chip Power Summary 58
Table 7.16 Thermal Summary 58
Table 7.17 Power Supply Summary 58
Table 7.18 Power Supply Current 59
Table 7.19 Confidence Level 59
Table 7.20 By Hierarchy 60
xiv
LIST OF ABBREVIATIONS
A/D Analog To Digital
CAS Column Address Strobing
CLB Configurable Logic Block
DRAM Dynamic Random-Access Memory
FPGA Field-Programmable Gate Array
ISE Integrated Software Environment
I/O Input/ Output
LUTs Look-Up Tables
NCD Native Circuit Description
RAM Random Access Memory
RAS Row Address Strobing
ROM Read Only Memory
SDRAM Synchronous Dynamic Random-Access Memory
SRAM Static Random-Access Memory
XST Xilinx Synthesis Technology
1
CHAPTER 1
INTRODUCTION
Embedded applications with real-time requirements are mapped to heterogeneous multiprocessor
systems. The computational demands placed upon these systems are continuously increasing,
while power and area budgets limit the amount of resources that can be expended to reduce
costs, applications are often forced to share hardware resources. Functional correctness for Real-
Time application is only guaranteed if their timing requirements are considered throughout the
entire system when the requirements are not met, it may cause an unacceptable loss of
functionality or severe quality degradation. We focus on the real-time properties of the (off-chip)
memory.
SDRAM is a commonly used memory type because it provides a large amount of storage space at
low cost per bit. It comprises a hierarchical structure of banks and rows that have to be opened
and closed explicitly by the memory controller, where only one row in each bank can be open at
a time. Requests to the open row are served at a low latency, while request to a different row
results in a high latency, since it requires closing the open row and subsequent opening
of the requested row. Locality thus strongly influences the performance of the memory
subsystem.
The worst-case (minimum) bandwidth and worst-case (maxi- mum) latency are determined by
the way requests are mapped to the memory. The worst-case latency can be optimized by
accessing the memory at a small granularity (i.e. few words), such that the individual requests
take a small amount of time to complete. This allows fine-grained sharing of the memory
resource, at the expense of efficiency, since the overhead of opening and closing rows is
amortized over only a small number of bits. Latency sensitive requests like cache misses favor
this configuration. Conversely, to optimize for bandwidth, the memory has to be used as
efficiently as possible, which requires memory maps that use a large access granularity.
2
Existing memory controllers offer only limited configurability of the memory mapping and are
unable to balance this trade-off based on the application requirements .A memory controller
must take the latency and bandwidth requirements of all of its applications into account, while
staying within the given power budget. This requires an understanding of the effect that
different memory maps have on the attainable worst-case bandwidth, latency and power.
1.1 LITERATURE SURVEY
Synchronous DRAM (SDRAM) has become a mainstream memory of choice in embedded
system memory design due to its speed, burst access and pipeline features. For high-end
applications using processors such as Motorola MPC 8260 or Intel StrongArm, the interface to
the SDRAM is supported by the processor’s built-in peripheral module. However, for other
applications, the system designer must design a controller to provide proper commands for
SDRAM initialization, read/write accesses and memory refresh.
In some cases, SDRAM is chosen because the previous generations of DRAM (FP and EDO) are
either end-of-life or not recommended for new designs by the memory vendors. From the board
design point of view, design using earlier generations of DRAM is much easier and more
straightforward than using SDRAM unless the system bus master provides the SDRAM interface
module as mentioned above. This SDRAM controller reference design, located between the
SDRAM and the bus master, reduces the user’s effort to deal with the SDRAM command
interface by providing a simple generic system interface to the bus master.
In today's SDRAM market, there are two major types of SDRAM distinguished by their data
transfer rates. The most common single data rate (SDR) SDRAM transfers data on the rising edge
of the clock. The other is the double data rate (DDR) SDRAM which transfers data on both the
rising and falling edge to double the data transfer throughput. Other than the data transfer phase,
the different power-on initialization and mode register definitions, these two SDRAMs share the
same command set and basic design concepts. This reference design is targeted for SDR
SDRAM, however, due to the similarity of SDR and DDR SDRAM, this design can also be
modified for a DDR SDRAM controller.
3
For illustration purposes, the Micron SDR SDRAM MT48LC32M4A2 (8Meg x 4 x 4 banks) is
chosen for this design. Also, this design has been verified by using Micron’s simulation model.
It is highly recommended to download the simulation model from the SDRAM vendors for
timing simulation when any modifications are made to this design.
Several SDRAM controllers focusing on real-time applications have been proposed, all trying to
maximize the worst case performance. Uses a static command schedule computed at design time.
Full knowledge of the application behavior is thus required, making it unable to deal with
dynamism in the request streams. The controller proposed in dynamically schedules pre-
computed sequences of SDRAM commands according to a fixed set of scheduling rules. The
controller proposed in follows a similar approach. Dynamically schedules commands at run-time
according to a set of rules from which an upper bound on the latency of a request is determined
and use a memory map that always interleaves requests over all banks in the SDRAM, which sets
a high lower bound on the smallest request size that can be supported efficiently. Supports
multiple bursts to each bank in an access to increase guaranteed bandwidth for large requests.
Allows only single burst accesses to all banks in a fixed sequential manner, although multiple
banks can be clustered to create a single logical resource. None of the mentioned controllers take
power into account, despite it being an increasingly important design constraint.
1.1 GOAL OF THE PROJECT
1) We explore the full memory map design space by allowing requests to be interleaved over a
variable number of banks. This reduces the minimum access granularity and can thus be
beneficial for applications with small requests or tight latency constraints.
2) We propose a configuration methodology that is aware of the real-time and power constraints,
such that an optimal memory map can be selected.
4
CHAPTER 2
BACKGROUND
There are two different types of random access memory: synchronous and dynamic.
Synchronous random access memory (SRAM) is used for high-speed, low power applications
while dynamic random access memory (DRAM) is used for its low cost and high density.
Designers have been working to make DRAM faster and more energy efficient .The following
sections will discuss the differences between these two types of RAM, as well as present the
progression of DRAM towards a faster, more energy efficient design.
2.1 RANDOM ACCESS MEMORY
Today, the most common type of memory used in digital systems is random access memory
(RAM). The time it takes to access RAM is not affected by the data’s location in memory. RAM
is volatile, meaning if power is removed, then the stored data is lost. As a result, RAM cannot be
used for permanent storage. However, RAM is used during runtime to quickly store and retrieve
data that is being operated on by a computer. In contrast, nonvolatile memory, such as hard
disks, can be used for storing data even when not powered on. Unfortunately, it takes much
longer for the computer to store and access data from this memory. There are two types of
RAM: static and dynamic. In the following sections the differences between the two types and
the evolution of DRAM will be discussed.
2.2 STATIC RANDOM ACCESS MEMORY
Static random access memory (SRAM) stores data as long as power is being supplied to the
chip.
5
Each memory cell of SRAM stores one bit of data using six transistors: a flip flop and two
access transistors (i.e. four transistors). SRAM is the faster of the two types of RAM because it
does not involve capacitors, which involve sense amplification of a small charge. For this
reason, it is used in cache memory of computers. Additionally, SRAM requires a very small
amount of power to maintain its data in standby mode Although SRAM is fast and energy
efficient it is also expensive due to the amount of silicon needed for its large cell size. This
presented the need for a denser memory cell, which brought about DRAM.
2.3 DYNAMIC RANDOM ACCESS MEMORY
According to Wakerly , “In order to build RAMs with higher density (more bits per chip), chip
designers invented memory cells that use as little as one transistor per bit. Each DRAM cell
consists of one transistor and a capacitor. Since capacitors “leak” or lose charge over time,
DRAM must have a refresh cycle to prevent data loss.
According to a high-performance DRAM study on earlier versions of DRAM, DRAM’s
refresh cycle is one reason DRAM is slower than SRAM. The cells of DRAM use sense
amplifiers to transmit data to the output buffer in the case of a read and transmit data back to the
memory cell in the case of a refresh. During a refresh cycle, the sense amplifier reads the
degraded value on a capacitor into a D- Latch and writes back the same value to the capacitor
so it is charged correctly for 1 or 0. Since all rows of memory must be refreshed and the sense
amplifier must determine the value of a, already small, degenerated capacitance, refresh
takes a significant amount of time. The refresh cycle typically occurs about every 64
milliseconds the refresh rate of the latest DRAM (DDR3) is about 1 microsecond.
Although refresh increases memory access time, according to a high-performance DRAM study
on earlier versions of DRAM, the greatest amount of time is lost during row
addressing, more specifically, “[extracting] the required data from the sense amps/row caches” .
During addressing, the memory controller first strobes the row address (RAS) onto the address
bus. Once the RAS is sent, a sense amplifier (one for each cell in the row) determines if a
charge indicating a 1 or 0 is loaded into each capacitor.
6
This step is long because “the sense amplifier has to read a very weak charge” and “the row is
formed by the gates of memory cells.” The controller then chooses a cell in the row from
which to read from by strobing the column address (CAS) onto the address bus. A write
requires the enable signal to be asserted at the same time as the CAS, while a read requires the
enable signal to be de-asserted. The time it takes the data to move onto the bus after the CAS is
called the CAS latency.
Although recent generations of DRAM are still slower than SRAM, DRAM is used when a
largeramount of memory is required since it is less expensive. For example, in embedded
systems, a small block of SRAM is used for the critical data path, and a large block of DRAM
is used to satisfy all other needs .The following section will discuss the development of
DRAM into a faster, more energy efficient memory.
2.4 DEVELOPMENT OF DRAM
Many factors are considered in the development of high performance RAM. Ideally, the
developer would always like memory to transfer more data and respond in less time; memory
would have higher bandwidth and lower latency. However, improving upon one factor often
involves sacrificing the other.
Bandwidth is the amount of data transferred per second. It depends on the width of the data
bus and the frequency at which data is being transferred. Latency is the time between when the
address strobe is sent to memory and when the data is placed on the data bus. DRAM is
slower than SRAM because it periodically disables the refresh cycle and because it takes a much
longer time to extract data onto the memory bus. Advancements have been, however, to
several different aspects of DRAM to increase bandwidth and decrease latency.
Over time, DRAM has evolved to become faster and more energy efficient by decreasing in cell
size and increasing in capacity. In the following section, we will look at different types of
DRAM and how DDR3 memory has come to be.
7
2.4.1 DRAM
One of the reasons the original DRAM was very slow is because of extensive addressing
overhead. In the original DRAM, an address was required for every 64-bit access to memory.
Each access took six clock cycles. For a four 64-bit access to consecutive addresses in memory,
the notation for timing was 6-6-6-6. Dashes separate memory accesses and the numbers indicate
how long the accesses take. This DRAM timing example took 24 cycles to access the memory
four times. In contrast, more recent DRAM implements burst technology which can send
many 64-bit words toconsecutive addresses. While the first access still takes six clock cycles
due memory accessing, the next three adjacent addresses can be performed in as little as one
clock cycle since the addressing does not need to be repeated.
During burst mode, the timing would be 6-1-1-1, a total of nine clock cycles. The original DRAM
is also slower than its descendants because it is asynchronous. This means there is no memory
bus clock to synchronize the input and output signals of the memory chip. The timing
specifications are not based on a clock edge, but rather on maximum and minimum timing
values (in seconds). The user would need to worry about designing a state machine with idle
states, which may be inconsistent when running the memory at different frequencies.
2.4.2 Synchronous DRAM
In order to decrease latency, SDRAM utilizes a memory bus clock to synchronize signals to and
from the system and memory. Synchronization ensures that the memory controller does not need
to follow strict timing; it simplifies the implemented logic and reduces memory access latency.
With a synchronous bus, data is available at each clock cycle.
SDRAM divides memory into two to four banks for concurrent access to different parts of
memory.Simultaneous access allows continuous data flow by ensuring there will always be a
memory bank read for access. The addition of banks adds another segment to the addressing,
resulting in a bank, row and column address.
8
The memory controller determines if an access addresses the same bank and row as the
previous access, so only a column address strobe must be sent. This allows the access to occur
much more quickly and can decrease overall latency.
2.4.3 DDR1 SDRAM
DDR1 SDRAM (i.e. first generation of SDRAM) doubles the data rate (hence the term DDR)
of SDRAM without changing clock speed or frequency. DDR transfers data on both the rising
and falling edge of the clock, has a pre-fetch buffer and low voltage signaling, which makes
it more energy efficient than previous designs.
Unlike SDRAM, which transfers 1 bit per clock cycle from the memory array to the data queue,
DDR1 transfers 2 bits to the queue in two separate pipelines. The bits are released in order on the
same output line. This is called a 2n-prefetch architecture. In addition, DDR1 utilizes double
transition clocking by triggering on both the rising and falling edge of the clock to transfer data.
As a result, the bandwidth of DDR1 is doubled without an increase in the clock frequency.
In addition to doubling the bandwidth, DDR1 made advances is energy efficiency. DDR1 can
operate at 2.5V instead of the 3.3V operating point of SDRAM thanks to low voltage signaling
technology.
2.4.4 DDR2 SDRAM
Data rates of DDR2 SDRAM are up to eight times more than original SDRAM. At an operation
voltage of1.8V, it achieves lower power consumption than DDR1. DDR2 SDRAM has a 4-bit
prefetch buffer, an improvement from the DDR12-bit prefetch. This means that 4 bits are
transferred per clock cycle from the memory array to the data bus, which increases bandwidth.
9
2.4.5 DDR3 SDRAM
DDR3 provides two burst modes for both reading and writing: burst chop (BC4) and burst
length eight (BL8). BC4 allows bursts of four by treating data as though half of it is masked.
This creates smooth transitioning if switching from DDR2 to DDR3 memory. However, burst
mode BL8 is the primary burst mode. BL8 allows the most data to be transferred in the least
amount of time; it transfers the greatest number of 64-bit data packets (eight) to or from
consecutive addresses in memory, which means addressing occurs once for every eight data
packets sent. In order to support a burst length of eight data packets, DDR3 SDRAM has an 8-
bit prefetch buffer.DDR3, like its predecessors, not only improves upon bandwidth, but also
energy conservation.Power consumption of DDR3 can be up to 30 percent less than DDR2. The
DDR3 operating voltage is the lowest yet, at 1.5 V, and low voltage versions are supported at
voltages of 1.35 V.
2.5 TIMELINE
Ideally, memory performance would improve at the same rate as central processing unit
(CPU) performance. However, memory latency has only improved about five percent each
year . The longest latency (RAS latency) of the newest release of DRAM for each year is
shown in the plot in Figure 2.1.
Figure 2.1 DRAM Row Access Latency vs. Year
10
As seen in Figure 2.1, the row access latency decreases linearly with every new release of
DRAM until 1996. Once SDRAM is released in 1996, the difference in latency from year to
year is much smaller. With recent memory releases it is much more difficult to reduce RAS
latency.
This can be seen especially for DDR2 and DDR3 memory releases 2006 to 2012.CAS latency,
unlike RAS latency, consistently decreases (bandwidth increases) with every memory
release, and in the new DDR3 memory, is very close to 0 ns. Figure 2.2 shows the column access
latency.
Figure 2.2 DRAM Column Address Time vs. Year
Looking at some prominent areas of the CAS graph, it can be seen in Figure 2.2 that bandwidth
greatly increased (CAS decreased) from 1983 to 1986. This is due to the switch from NMOS
DRAMs to CMOS DRAMs. In1996 the first SDRAM was released. The CAS latency decreased
(bandwidth increased) due to synchronization and banking. In later years, the CAS latency does
not decrease by much, but this is expected since the latency is already much smaller. Comparing
Figure 2.2 to Figure 2.1, CAS time decreases much more drastically than RAS time. This
means the bandwidth greatly improves, while latency improves much more slowly. In 2010,
when DDR2 was released, it can be seen that latency was sacrificed (Figure 2.1) for an
increase in bandwidth (Figure 2.2).
11
CHAPTER 3
METHODOLOGY
In this section the ML605 and Virtex-6 board hardware is described as well as the tools
utilized for design and validation. The Xilinx Integrated Software Environment (ISE) was used
for design and iSim and ChipScope were used for validation in simulation and in hardware.
3.1 HARDWARE
3.1.1 Virtex-6FPGA
The Virtex-6 FPGA (XC6VLX240T) is used to implement the arbiter. This FPGA has 241, 152
logic cells and is organized into banks (40 pins per bank). These logic cells, or slices, are
composed of four look-up tables (LUTs), multiplexers and arithmetic carry logic.
LUTs implement Boolean functions, and multiplexers enable combinatorial logic. Two slices
form a configurable logic block (CLB). In order to distribute a clock signal to all these logic
blocks, the FPGA has five types of clock lines: BUFG, BUFR, BUFIO, BUFH, and high-
performance clock. These lines satisfy “requirements of high fan out, short propagation delay,
and extremely low skew”. The clock lines are also split into categories depending on the sections
of the FPGA and components they drive. The three categories are: global, regional, and I/O lines.
Global clock lines drive all flip-flops, clock enables, and many logic inputs. Regional clock lines
drive all clock destinations in their region and two bordering regions. There are six to eighteen
regions in an FPGA. Finally, I/O clock lines are very fast and only drive I/O logic and
serializer/deserializer circuits.
12
3.1.2 ML605 Board
The Virtex-6 FPGA is included on the ML605 Development Board. In addition to the FPGA, the
development board includes a 512 MB DDR3 small outline dual inline memory module
(SODIMM), which our design arbitrates access to. A SODIMM is the type of board the memory
is manufactured on .The FPGA also includes 32 MB of linear BPI Flash and 8 Kb of IIC
EEPROM.
Communication mechanisms provided on the board include Ethernet, SFP transceiver
connector, GTX port, USB to UART Bridge, USB host and peripheral port, and PCI Express.
The only connection used during this project was the USB JTAG connector. It was used to
program and debug the FPGA from the host computer.
There are three clock sources on the board: a 200 MHz differential oscillator, 66 MHz single-
ended oscillator and SMA connectors for an external clock. This project utilizes the 200MHz
oscillator. Peripherals on the ML605 board were useful for debugging purposes. The push
buttons were used to trigger sections of code execution in ChipScope such as reading and
writing from memory. Dip switches acted as configuration inputs to our code. For example,
they acted as a safety to ensure the buttons on the board were not automatically set to active
when the code was downloaded to the board. In addition, the value on the switches indicated
which system would begin writing first for debugging purposes. LEDs were used to check
functionality of sections of code as well, and for additional validation, they can be used to
indicate if an error as occurred. Although we did not use it, the ML605 board provides an LCD.
3.2 TOOLS
Now that the hardware where the design is placed is described, the software used to
manipulate the design can be described. The tools for design include those provided within
Xilinx Integrated Software Environment, and the tools used for validation include iSim and
ChipScope. This looks at the turn-around time for both validation tools and what it means for the
design process.
13
3.2.1 Xilinx Integrated Software Environment (ISE)
We designed the arbiter using Verilog hardware description language in Xilinx Integrated
Software Environment (ISE). ISE is an environment in which the user can “take [their] design
from design entry through Xilinx device programming”. The main workbench for ISE is ISE
Project Navigator. The Project Navigator tool allows the user to effectively manage their
design and call upon development processes. In Figure 3.1, a screen shot of ISE Project
Navigator :
Figure 3.1 Screen Shot of ISE Project Navigator
Figure 3.1 shows some main windows in ISE Project Navigator. On the right hand side is the
window for code entry. The hierarchal view of modules in the design appears on the
left, and when implementation is selected from the top, the design implementation progress is
shown in the bottom window. If simulation were selected instead of implementation there
would be an option to run the design for simulation.
The main processes called upon by ISE are synthesis, implementation, and bit stream
generation. During synthesis, Xilinx Synthesis Technology (XST) is called upon. XST
synthesizes Verilog, VHDL or mixed language designs and creates netlist files. Netlist files, or
NGC files, contain the design logic and constraints.
14
They are saved for use in the implementation process. During synthesis, the XST checks for
synthesis errors (parsing) and infers macros from the code. When the XST infers macros it
recognizes parts of the code that can be replaced with components in its library such as MUXes,
RAM encodes them in a way that would be best for reduced area and/or increased speed.
Implementation is the longest process to perform on the design. The first step of
implementation is to combine the netlists and constraints into a design/NGD file. The NGD
file is the design file reduced to Xilinx primitives. This process is called translation. During the
second step, mapping, the design is fitted into the target device. This involves turning logic into
FPGA elements such as configurable logic blocks. Mapping produces a native circuit
description (NCD) file.
The third step, place and route, uses the mapped NCD file to place the design and route timing
constraints. Finally, the program file is generated and, at the finish of this step, a bit stream is
ready to be downloaded to the board.
3.2.2 Synthesis and Simulation
Once the design has been synthesized, simulation of the design is possible. Simulating a design
enables verification of logic functionality and timing. We used simulation tool in ISE (isim) to
view timing and signal values. In order to utilize isim, we created a test bench to provide the
design with stimulus. Since simulation only requires design synthesis, it is a relatively fast
process. The short turn-around time of simulation means we were able to iteratively test small
changes to the design and, therefore, debug our code efficiently.
3.2.3 Implementation and Hardware VALIDATION
Once the design was working in simulation, we still needed to test the design’s
functionality in hardware. Testing the design in hardware is the most reliable validation
method. In order to download the design to the board, it first needs to be implemented in ISE.
15
Implementation has a much longer turn- around time than synthesis, so while functionality in
hardware ensures the design is working, simulation is the practical choice for iterative
verification.
In order to test our design in hardware, we utilized ChipScope Pro Analyzer, a GUI which
allows the user to “configure [their] device, choose triggers, setup the console, and view results
of the capture on the fly”. In order to use ChipsScope Pro, you may either insert ChipScope
Pro Cores into the design using the Core Generator, a tool that can be accessed in ISE Project
Figure 3.2 Flow Chart and Timing for Simulation and Hardware Validation
Navigator, or utilize the Plan Ahead or Core Inserter tool, which automatically inserts cores into
the design netlist for you. One method of inserting ChipScope cores into the design is by utilizing
Plan Ahead software. The Plan Ahead tool enables the creation of floorplans.
16
Floorplans provide an initial view of “the design’s interconnect flow and logic module sizes.
This helps the designer to “avoid timing, utilization, and routing congestion issues. Plan Ahead
also allows the designer to create and configure I/O ports and analyze implementation results,
which aids in the discovery of bottlenecks in the design.
For our project, however, we utilized Plan Ahead only for its ability to automatically insert
ChipScope cores. Plan Ahead proved to be inefficient for our purposes since many times, when a
change was made in the design, the whole netlist would need to be selected again.
In addition, there were bugs in the software that greatly affected the turn-around time of
debugging, and it crashed several times. If Plan Ahead were used for floor planning and other
design tools, then it might have proved to be much for useful.
In replace of Plan Ahead, we utilized the Core Generator within ISE. The ChipScope
cores provided by Xilinx include ICON, ILA, VIO, ATC2, and IBERT. The designer can
choose which cores to insert by using the Core Generator in ISE. The ICON core provides
communication between the different cores and the computer running ChipScope. It can connect
up to fifteen ILA, VIO, and ATC2 cores.
The ILA core is used to synchronously monitor internal signals. It contains logic to trigger inputs
and outputs and capture data. ILA cores allow up to sixteen trigger ports, which can be 1 to
256 bits wide. The VIO core can monitor signals like ILA, but also drive internal FPGA signals
real-time. The ATC2 core is similar to the ILA core, but was created for Agilent FPGA
dynamic probe technology. Finally, the IBERT core contains “all the logic to control, monitor,
and change transceiver parameters and perform bit error ratio tests.
The only ChipScope cores we were concerned with in this project were the ICON and ILA cores
We inserted one ChipScope ILA and ICON cores using the ISE Core Generator within
ISE Project Navigator. The ILA core allowed us to monitor internal signals in the FPGA.
Instead of inserting a VIO core, which allows inputs to and outputs from ChipScope, we used
buttons to trigger the execution of write and read logic.
17
3.2.4 Analysis of Turn-Around Times
As introduced in sections 3.3.2 and 3.3.3, implementation takes much longer than synthesis.
Therefore, when it comes down to turn-around time, simulation is much more effective for
iterative debugging. In Figure 3.2, the phases for simulation and hardware validation can be
seen as well as the time it takes to complete each phase.
For simulation, the process starts at Verilog code, becomes synthesized logic, and using a test
bench, is run in iSim for viewing. This process takes about eight minute’s total. A system’s
simulation run-time is much longer than if it were running on hardware, but simulation is still
faster than hardware validation because it does not have to undergo implementation.
The bottleneck in our simulation process is the set up time for the DDR3 memory model which
accounts for most of the simulation time. Hardware validation starts at Verilog code, is
synthesized, implemented, and imported into ChipScope. This whole process takes about fifteen
minutes.
Most of the time spent for hardware validation is on implementation of the design. In addition,
hardware validation requires more of the user’s attention. It is more difficult and takes more
time to set up a ChipScope core than it does to create a test bench for simulation. While a
test bench (green) involves writing some simple code, a ChipScope core (orange) involves
setting up all the signals to be probed. Not only is simulation faster, but the iSim tool is easier
to use than ChipScope. Figure.3.3shows
18
Figure 3.3 iSim Screen Shot
The screen shot of iSim shows the instance names in the first column, all the signals to choose
from in the second, and the signals and their waveforms in the third and fourth columns. The
user can view any signal without having to port it out of the design and re-implement like
when using ChipScope. When adding an additional signal in iSim, only simulation needs to be
restarted. The iSim interface makes debugging much easier with collapsible signal viewing,
grouping abilities, and a large window for viewing many signals at once.
A screen shot of ChipScope is shown in Figure 3.4 In ChipScope, you can view the devices,
signals, triggers, and waveforms window.The time ChipScope is able to capture is much less
than iSim. For this reason, triggers are required to execute different parts of code; this is
where buttons were utilized. If a signal could not fit into the allowable number of signal inputs
or was forgotten, it would need to be added to the design and implemented all over again much
longer turn-around time than simulation. Therefore, simulation is used for iterative debugging and
functionality testing, while hardware validation is the next step to ensure design accuracy.
19
Figure 3.4 ChipScope Screen Shot
3.2.5 Xilinx Core Generator
One tool in ISE that was very important to our project was the CORE Generator. The core
generator provided us with not only the ChipScope cores, but the memory controller, and FIFOs
as well. The core generator can be accessed within ISE Project Navigator. It provides many
additional functions for the designer.
The options provided for creating FIFOs, for example, include common or independent clocks,
first-word fall-through; a variety of flags to indicated the amount of data in the FIFO and write
width, read width and depth.
The different width capabilities allowed us to create asynchronous FIFOs. The memory
controller was created using the Xilinx memory interface generator (MIG). There were options
to use an AXI4, native, or user interface, which is discussed in a following section on interfacing
with the Xilinx MIG.
20
CHAPTER 4
ARCHITECTURE
The SDR SDRAM Controller consists of four main modules: the SDRAM controller, control
interface, command, and data path modules. The SDRAM controller module is the top-level
module that instantiates the three lower modules and brings the whole design together. The
control interface module accepts commands and related memory addresses from the host,
decoding the command and passing the request to the command module. The command module
accepts commands and addresses from the control interface module, and generates the proper
commands to the SDRAM. The data path module handles the data path operations during
WRITEA and READA commands. The SDRAM controller module also instantiates a PLL that is
used in the CLOCK_LOCK mode to improve I/O timing. This PLL is not essential to the
operation of the SDR SDRAM Controller and can be easily removed.
Figure 4 Architecture of SDRAM controller
21
4.1 CONTROL INTERFACE MODULE
The control interface module decodes and registers commands from the host, and passes the
decoded NOP, WRITEA, READA, REFRESH, PRECHARGE, and LOAD_MODE commands,
and ADDR to the command module. The LOAD_REG1 and LOAD_REG2 commands are
decoded and used internally to load the REG1 and REG2 registers with values from ADDR.
Figure 4.1 shows the control interface module block diagram.
Figure 4.1 Control Interface Module
22
The control interface module also contains a 16-bit down counter and control circuit that is used
to generate periodic refresh commands to the command module. The 16-bit down counter is
loaded with the value from REG2 and counts down to zero. The REFRESH_REQ output is
asserted when the counter reaches zero and remains asserted until the command module
acknowledges the request. The acknowledge from the command module causes the down counter
to be reloaded with REG2 and the process repeats. REG2 is a 16-bit value that represents the
period between REFRESH commands that the SDR SDRAM Controller issues. The value is set
by the equation int (refresh_period/clock_period).
For example, if an SDRAM device that is connected to the SDR SDRAM Controller has a 64-ms,
4096-cycle refresh requirement, the device must have a REFRESH command issued to it at least
every64 ms/4096 = 15.625 µs. If the SDRAM and SDR SDRAM Controller are clocked by a
100-MHz clock, the maximum value of REG2 is 15.625 µs/0.01µs = 1562d.
4.2 COMMAND MODULE
The command module accepts decoded commands from the control interface module, refresh
requests from the refresh control logic, and generates the appropriate commands to the SDRAM.
The module contains a simple arbiter that arbitrates between the commands from the host
interface and the refresh requests from the refresh control logic. The refresh requests from the
refresh control logic have priority over the commands from the host interface. If a command from
the host arrives at the same time or during a hidden refresh operation, the arbiter holds off the
host by not asserting CMDACKuntil the hidden refresh operation is complete. If a hidden refresh
command is received while a host operation is in progress, the hidden refresh is held off until the
host operation is complete. Figure 4.2 shows the command module block diagram.
23
Figure 4.2 Command Module Block Diagram
After the arbiter has accepted a command from the host, the command is passed onto the
command generator portion of the command module. The command module uses three shift
registers to generate the appropriate timing between the commands that are issued to the
SDRAM. One shift register is used to control the timing the ACTIVATE command; a second is
used to control the positioning of the READA or WRITEA commands; a third is used to time
command durations, which allows the arbiter to determine if the last requested operation has been
completed.
The command module also performs the multiplexing of the address to the SDRAM. The row
portion of the address is multiplexed out to the SDRAM outputs A[11:0] during the
ACTIVATE(RAS) command. The column portion is then multiplexed out to the SDRAM address
outputs during a READA (CAS) or WRITEA command.
The output signal OEis generated by the command module to control tristate buffers in the last
stage of the DATAIN path in the data path module.
24
4.3 DATA PATH MODULE
The data path module provides the SDRAM data interface to the host. Host data is accepted on
DATAINfor WRITEA commands and data is provided to the host on DATAOUTduring READA
commands.
Figure 4.3 shows the data path module block diagram.
Figure 4.3 Data Path Module
The DATAINpath consists of a 2-stage pipeline to align data properly relative to the CMDACK
and the commands that are issued to the SDRAM. DATAOUTconsists of a 2-stage pipeline that
registers data from the SDRAM during a READA command. DATAOUTpipeline delay can be
reduced to one or even zero registers, with the only affect that the relationship of DATAOUTto
CMDACKchanges.
25
CHAPTER 5
OPERATION
The single data rate (SDR) synchronous dynamic random access memory (SDRAM) controller
provides a simplified interface to industry standard SDR SDRAM. The SDR SDRAM Controller
is available in either Verilog HDL or VHDL and is optimized for the architecture. The SDR
SDRAM Controller supports the following features:
 Burst lengths of 1, 2, 4, or 8 data words.
 CAS latency of 2 or 3 clock cycles.
 16-bit programmable refresh counter used for automatic refresh.
 2-chip selects for SDRAM devices.
 Supports the NOP, READA, WRITEA, AUTO_REFRESH, PRECHARGE, ACTIVATE,
BURST_STOP, and LOAD_MR commands.
 Support for full-page mode operation.
 Data mask line for write operations.
 PLL to increase system performance.
Figure 5 SDR SDRAM Controller System-Level Diagram
26
5.1 SDRAM OVERVIEW
SDRAM is high-speed dynamic random access memory (DRAM) with a synchronous interface.
The synchronous interface and fully-pipelined internal architecture of SDRAM allows extremely
fast data rates if used efficiently. Internally, SDRAM devices are organized in banks of memory,
which are addressed by row and column. The number of row- and column-address bits and the
number of banks depends on the size of the memory.
SDRAM is controlled by bus commands that are formed using combinations of the RASN, CASN,
and WENsignals. For instance, on a clock cycle where all three signals are high, the associated
command is a no operation (NOP). A NOP is also indicated when the chip select is not asserted.
Table 5.1 shows the standard SDRAM bus commands.
Table 5.1 SDRAM Bus Commands
SDRAM banks must be opened before a range of addresses can be written to or read from. The
row and bank to be opened are registered coincident with the ACT command.
When a bank is accessed for a read or a write it may be necessary to close the bank and re-open it
if the row to be accessed is different than the row that is currently opened.
Closing a bank is done with the PCH command.
27
The primary commands used to access SDRAM are RD and WR. When the WR command is
issued, the initial column address and data word is registered. When a RD command is issued, the
initial address is registered. The initial data appears on the data bus 1 to 3 clock cycles later.
This is known as CAS latency and is due to the time required to physically read the internal DRAM
core and register the data on the bus. The CAS latency depends on the speed of the SDRAM and
the frequency of the memory clock. In general, the faster the clock, the more cycles of CAS latency
are required. After the initial RD or WR command, sequential read and writes continue until the
burst length is reached or a BT command is issued. SDRAM memory devices support burst lengths
of 1, 2, 4, or 8 data cycles. The ARF is issued periodically to ensure data retention. This function is
performed by the SDR SDRAM Controller and is transparent to the user.
The LMR is used to configure the SDRAM mode register which stores the CAS latency, burst
length, burst type, and write burst mode. Consult the SDRAM specification for additional details.
SDRAM comes in dual in-line memory modules (DIMMs), small-outline DIMMs (SO-DIMMs)
and chips. To reduce pin count SDRAM row and column addresses are multiplexed on the same
pins. SDRAM often includes more than one bank of memory internally and DIMMS may require
multiple chip selects.
5.2 FUNCTIONAL DESCRIPTION
Table shows the SDR SDRAM Controller interface signals. All signals are synchronous to the
system clock and outputs are registered at the SDR SDRAM Controller’s outputs.
28
Table 5.2 Interface Signals
5.3 SDRAM CONTROLLER COMMAND INTERFACE
The SDR SDRAM Controller provides a synchronous command interface to the SDRAM and
several control registers. Table shows the commands, which are described in following sections.
The following rules apply to the commands with reference with table 5.2:
 All commands, except NOP, are driven by the user ontoCMD [2:0]; ADDR and DATAIN
are set appropriately for the requested command. The controller registers the command on
the next rising clock edge.
29
 To acknowledge the command the controller asserts CMDACKfor one clock period.
 For READA or WRITEA commands, the user should start receiving or writing data on
DATAOUTand DATAIN.
 The user must drive NOP onto CMD [2:0] by the next rising clock edge after CMDACKis
asserted.
Table 5.3 Interface Commands
5.3.1 NOP Command
NOP is a no operation command to the controller. When NOP is detected by the controller, it
performs a NOP in the following clock cycle. A NOP must be issued the following clock cycle
after the controller has acknowledged a command.
The NOP command has no affect on SDRAM accesses that are already in progress.
30
5.3.2 READA Command
Figure 5.1 Timing diagram for a READA command
The READA command instructs the SDR SDRAM Controller to perform a burst read with auto-
precharge to the SDRAM at the memory address specified by ADDR. The SDR SDRAM
Controller issues an ACTIVATE command to the SDRAM followed by a READA command. The
read burst data first appears on DATAOUT(RCD + CL + 2) after the SDR SDRAM Controller
asserts CMDACK. During a READA command the user must keep DMlow.
When the controller is configured for full-page mode, the READA command becomes READ
(READ without auto-pre- charge). Figure 5.1 shows an example timing diagram for a READA
command.
31
The following sequence describes the general operation of the READA command:
 The user asserts READA, ADDRand DM.
 The SDR SDRAM Controller asserts CMDACK to acknowledge the command and
simultaneously starts issuing commands to the SDRAM devices.
 One clock after CMDACKis asserted, the user must assert NOP.
 The CMDACKpresents the first read burst value on DATAOUT, the remainder of the read
bursts follow every clock cycle.
5.3.3 WRITEA Command
Figure 5.2 Timing diagram for a WRITEA command
The WRITEA command instructs the SDR SDRAM Controller to perform a burst write with auto-
precharge to the SDRAM at the memory address specified by ADDR.
32
The SDR SDRAM Controller will issue an ACTIVATE command to the SDRAM followed by a
WRITEA command. The first data value in the burst sequence must be presented with the
WRITEA and ADDR address. The host must start clocking data along with the desired DMvalues
into the SDR SDRAM Controller (tRCD – 2) clocks after the SDR SDRAM Controller has
acknowledged the WRITEAcommand.
See a SDRAM data sheet for how to use the data mask lines DM/DQM.When the SDR SDRAM
Controller is in the full-page mode WRITEA becomes WRITE (write without auto-precharge).
Figure shows an example timing diagram for a WRITEA command. The following sequence
describes the general operation of a WRITEA command:
 The user asserts WRITEA, ADDR, the first write data value on DATAIN, and the desired
data mask value on DM with reference to the table 5.2 and 5.3.
 The SDR SDRAM Controller asserts CMDACK to acknowledge the command and
simultaneously starts issuing commands to the SDRAM devices.
 One clock after CMDACKwas asserted, the user asserts NOP on CMD.
 The user clocks data and data mask values into the SDR SDRAM Controller through
DATAIN and DM.
5.3.4 REFRESH Command
The REFRESH command instructs the SDR SDRAM Controller to perform an ARF command to
the SDRAM. The SDR SDRAM Controller acknowledges the REFRESH command with
CMDACK. Figure 5.3 shows an example timing diagram of the REFRESH command.
33
Figure 5.3 Timing diagram for a REFRESH command
The following sequence describes the general operation of a REFRESH command:
 The user asserts REFRESH on the CMDinput.
 The SDR SDRAM Controller asserts CMDACK to acknowledge the command and
simultaneously starts issuing commands to the SDRAM devices.
 The user asserts NOP on CMD
34
5.3.5 PRECHARGE Command
Figure 5.4 Timing diagram for a PRECHARGE command
The PRECHARGE command instructs the SDR SDRAM Controller to perform a PCH command
to the SDRAM. The SDR SDRAM Controller acknowledges the command with CMDACK. The
PCH command is also used to generate a burst stop to the SDRAM. Using PRECHARGE to
terminate a burst is only supported in the full-page mode.
Note that the SDR SDRAM Controller adds a latency from when the host issues a command to
when the SDRAM sees the PRECHARGE command of 4 clocks. If a full-page read burst is to be
stopped after 100 cycles, the PRECHARGE command must be asserted (4 + CL – 1) clocks before
the desired end of the burst (CL – 1 requirement is imposed by the SDRAM devices). So if the
CAS latency is 3, the PRECHARGE command must be issued (100 – 3 –1 – 4) = 92 clocks into the
burst.
35
Figure 5.4 shows an example timing diagram of the PRECHARGE command. The following
sequence describes the general operation of a PRECHARGE command:
 The user asserts PRECHARGE on CMD.
 The DR SDRAM Controller asserts CMDACK to acknowledge the command and
simultaneously starts issuing commands to the SDRAM devices.
 The user asserts NOP on CMD
5.3.6 LOAD_MODE Command
The LOAD_MODE command instructs the SDR SDRAM Controller to perform a LMR command
to the SDRAM. The value that is to be written into the SDRAM mode register must be present on
ADDR [11:0]with the LOAD_MODE command. The value on ADDR [11:0]is mapped directly to
the SDRAM pins A11-A0 when the SDR SDRAM Controller issues the LMR to the SDRAM.
Figure 5.5 shows an example timing diagram.
 The following sequence describes the general operation of a LOAD_MODE command, the
users asserts LOAD_MODE on CMD.
 The SDR SDRAM Controller asserts CMDACK to acknowledge the command and
simultaneously starts issuing commands to the SDRAM devices.
 One clock after the SDR SDRAM Controller asserts CMDACK, the users asserts NOP on
CMD.
36
. Figure 5.5 Timing diagram for a LOAD_MODE Command
5.3.7 LOAD_REG1 Command
Table 5.4 REG1 Bit Definitions
37
CL is the CAS latency of the SDRAM memory in clock periods and is dependent on the memory
device speed grade and clock frequency. Consult the SDRAM data sheet for appropriate settings.
CL must be set to the same value as CL for the SDRAM memory devices.
RCD is the RAS to CAS delay in clock periods and is dependent on the SDRAM speed grade and
clock frequency. RCD = INT(tRCD/clock period), where tRCD is the value from the SDRAM data
sheet and clock period is the clock period of the SDR SDRAM Controller and SDRAM
clock.RRD is the refresh to RAS delay in clock periods. RRD is dependent on the SDRAM speed
grade and clock frequency. RRD= INT(tRRD/clock_period), where tRRD is the value from the
SDRAM data sheet and clock_period is the clock period of the SDR SDRAM controller and
SDRAM clock.PM is the page mode bit. If PM = 0, the SDR SDRAM Controller operates in non-
page mode. If PM = 1, the SDR SDRAM Controller operates in page-mode. See Section “Full-
Page Mode Operation” for more information. BL is the burst length the SDRAM devices have
been configured for.
5.3.8 LOAD_REG2 Command
The LOAD_REG2 command instructs the SDR SDRAM Controller to load the internal configuration
register REG2. REG2 is a 16-bit value that represents the period between REFRESH commands that the
SDR SDRAM Controller issues. The value is set by the equation int (refresh_period/clock period).
For example, if a SDRAM device connected to the SDR SDRAM Controller has a 64-ms, 4096-
cycle refresh requirement the device must have a REFRESH command issued to it at least every 64
ms/4096 = 15.625 09 µs.If the SDRAM and SDR SDRAM Controller are clocked by a 100 MHz
clock, the maximum value of REG2 is 15.625 µs/0.01 µs = 1562d. The value that is to be written
into REG2 must be presented on the ADDR input simultaneously with the assertion of the
command LOAD_REG2.
38
CHAPTER 6
ELEMENTS OF MEMORY BANK
6.1 DECODER
A decoder is a device which does the reverse operation of an encoder, undoing the encoding so
that the original information can be retrieved. The same method used to encode is usually just
reversed in order to decode. It is a combinational circuit that converts binary information from n
input lines to a maximum of 2n
unique output lines.
6.1.1 A 2-to-4 line single-bit decoder
In digital electronics, a decoder can take the form of a multiple-input, multiple-output logic
circuit that converts coded inputs into coded outputs, where the input and output codes are
different. E.g. n-to-2n
, binary-coded decimal decoders. Enable inputs must be on for the decoder
to function, otherwise its outputs assume a single "disabled" output code word. Decoding is
necessary in applications such as data multiplexing, 7 segment display and memory address
decoding.
The example decoder circuit would be an AND gate because the output of an AND gate is "High"
(1) only when all its inputs are "High." Such output is called as "active High output". If instead of
AND gate, the NAND gate is connected the output will be "Low" (0) only when all its inputs are
"High". Such output is called as "active low output". A slightly more complex decoder would be
the n-to-2n
type binary decoders. These type of decoders are combinational circuits that convert
binary information from 'n' coded inputs to a maximum of 2n
unique outputs.
We say a maximum of 2n
outputs because in case the 'n' bit coded information has
unused bit combinations, the decoder may have less than 2n
outputs.
39
We can have 2-to-4 decoder, 3-to-8 decoder or 4-to-16 decoder. We can form a 3-to-8 decoder
from two 2-to-4 decoders (with enable signals).
Figure 6.1 RTL of decoder
Similarly, we can also form a 4-to-16 decoder by combining two 3-to-8 decoders. In this type of
circuit design, the enable inputs of both 3-to-8 decoders originate from a 4th input, which acts as
a selector between the two 3-to-8 decoders. This allows the 4th input to enable either the top or
bottom decoder, which produces outputs of D(0) through D(7) for the first decoder, and D(8)
through D(15) for the second decoder.
Figure 6.2 Simulation Of Decoder
40
A decoder that contains enable inputs is also known as a decoder-demultiplexer. Thus, we have a
4-to-16 decoder produced by adding a 4th input shared among both decoders, producing 16
outputs.
6.2 DEMUX
The data distributor, known more commonly as a demultiplexer or “Demux” for short, is the
exact opposite of the Multiplexer we saw in the previous tutorial. The demultiplexer converts a
serial data signal at the input to a parallel data.
Figure 6.3 RTL Of DEMUX
41
The demultiplexer takes one single input data line and then switches it to any one of a number of
individual at its output lines output lines one at a time.
Figure 6.4 Simulation Of DEMUX
6.3 RAM
Random-access memory (RAM) is a form of computer data storage. A random-access memory
device allows data items to be read and written in roughly the same amount of time regardless of
the order in which data items are accessed. In contrast, with other direct-access data storage
media such as hard disks, CD-RWs, DVD-RWs and the older drum memory, the time required to
read and write data items varies significantly depending on their physical locations on the
recording medium, due to mechanical limitations such as media rotation speeds and arm
movement delays.
Today, random-access memory takes the form of integrated circuits. Strictly speaking, modern
types of DRAM are not random access, as data is read in bursts, although the name DRAM /
RAM has stuck. However, many types of SRAM are still random access even in a strict sense.
42
RAM is normally associated with volatile types of memory (such as DRAM memory modules),
where stored information is lost if the power is removed, although many efforts have been made
to develop non-volatile RAM chips. Other types of non-volatile memory exist that allow random
access for read operations, but either do not allow write operations or have limitations on them.
These include most types of ROM and a type of flash memory called NOR-Flash.
6.3.1 TYPES OF RAM
The two main forms of modern RAM are Static Ram (SRAM), dynamic RAM (DRAM). In
SRAM, a bit of data is stored using the state of a flip-flop. This form of RAM is more expensive
to produce, but is generally faster and requires less power than DRAM and, in modern
computers, is often used as cache memory for the CPU. DRAM stores a bit of data using a
transistor and capacitor pair, which together comprise a memory cell. The capacitor holds a high
or low charge (1 or 0, respectively), and the transistor acts as a switch that lets the control
circuitry on the chip read the capacitor's state of charge or change it. As this form of memory is
less expensive to produce than static RAM, it is the predominant form of computer memory used
in modern computers.
Figure 6.5 RTL of RAM
43
Both static and dynamic RAM are considered volatile, as their state is lost or reset when power is
removed from the system. By contrast, read-only memory (ROM) stores data by permanently
enabling or disabling selected transistors, such that the memory cannot be altered. Writeable
variants of ROM (such as EEPROM and flash memory) share properties of both ROM and RAM,
enabling data to persist without power and to be updated without requiring special equipment.
These persistent forms of semiconductor ROM include USB flash drives, memory cards for
cameras and portable devices, etc. ECC memory (which can be either SRAM or DRAM) includes
special circuitry to detect and/or correct random faults (memory errors) in the stored data,
using parity bits or error correction code.
In general, the term RAM refers solely to solid-state memory devices (either DRAM or SRAM),
and more specifically the main memory in most computers. In optical storage, the term DVD-
RAM is somewhat of a misnomer since, unlike CD-RW or DVD-RW it does not need to be
erased before reuse. Nevertheless a DVD-RAM behaves much like a hard disc drive if somewhat
slower.
Figure 6.6 Simulation of RAM
44
6.4 MUX
In electronics, a multiplexer is a device that selects one of several analog or digital input signals
and forwards the selected input into a single line. A multiplexer of 2n
inputs has n select lines,
which are used to select which input line to send to the output. Multiplexers are mainly used to
increase the amount of data that can be sent over the network within a certain amount of time
and bandwidth. A multiplexer is also called a data selector.
Figure 6.7 RTL of MUX
An electronic multiplexer can be considered as a multiple-input, single-output switch, and a
demultiplexer as a single-input, multiple-output switch. The schematic symbol for a multiplexer
is an isosceles trapezoid with the longer parallel side containing the input pins and the short
parallel side containing the output pin.
45
The schematic on the right shows a 2-to-1 multiplexer on the left and an equivalent switch on the
right. The wire connects the desired input to the output. An electronic multiplexer makes it
possible for several signals to share one device or resource, for example one A/D converter or
one communication line, instead of having one device per input signal.
Figure 6.8 Simulation Of MUX
6.5 BUFFER
A buffer amplifier (sometimes simply called a buffer) is one that provides electrical impedance
transformation from one circuit to another. Two main types of buffer exist: the voltage buffer and
the current buffer
46
6.5.1 VOLTAGE BUFFER
A voltage buffer amplifier is used to transfer a voltage from a first circuit, having a high output
impedance level, to a second circuit with a low input impedance level. The interposed buffer
amplifier prevents the second circuit from loading the first circuit unacceptably and interfering
with its desired operation. In the ideal voltage buffer in the diagram, the input resistance is
infinite, the output resistance zero (impedance of an ideal voltage source is zero). Other
properties of the ideal buffer are: perfect linearity, regardless of signal amplitudes; and instant
output response, regardless of the speed of the input signal.
If the voltage is transferred unchanged (the voltage gain Av is 1), the amplifier is a unity gain
buffer; also known as a voltage follower because the output voltage follows or tracks the input
voltage. Although the voltage gain of a voltage buffer amplifier may be (approximately) unity, it
usually provides considerable current gain and thus power gain. However, it is commonplace to
say that it has a gain of 1 (or the equivalent 0 dB), referring to the voltage gain.
As an example, consider a Thévenin source (voltage VA, series resistance RA) driving a resistor
load RL. Because of voltage division (also referred to as "loading") the voltage across the load is
only VA RL / ( RL + RA ). However, if the Thévenin source drives a unity gain buffer such as that
in Figure 1 (top, with unity gain), the voltage input to the amplifier is VA, and with no voltage
division because the amplifier input resistance is infinite. At the output the dependent voltage
source delivers voltage Av VA = VA to the load, again without voltage division because the output
resistance of the buffer is zero. A Thévenin equivalent circuit of the combined original Thévenin
source and the buffer is an ideal voltage source VA with zero Thévenin resistance.
Figure 6.9 RTL Of Buffer
47
6.5.2 CURRENT BUFFER
Typically a current buffer amplifier is used to transfer a current from a first circuit, having a
low output impedance level, to a second circuit with a high input impedance level. The interposed
buffer amplifier prevents the second circuit from loading the first circuit unacceptably and
interfering with its desired operation.
In the ideal current buffer in the diagram, the input impedance is zero and the output impedance
is infinite (impedance of an ideal current source is infinite). Again, other properties of the ideal
buffer are: perfect linearity, regardless of signal amplitudes; and instant output response,
regardless of the speed of the input signal.
For a current buffer, if the current is transferred unchanged (the current gain βi is 1), the amplifier
is again a unity gain buffer; this time known as a current follower because the output
current follows or tracks the input current.
Figure 6.10- Simulation Of Buffer
48
As an example, consider a Norton source (current IA, parallel resistance RA) driving a resistor
load RL. Because of current division (also referred to as "loading") the current delivered to the
load is only IA RA / ( RL + RA ). However, if the Norton source drives a unity gain buffer (bottom,
with unity gain), the current input to the amplifier is IA, with no current division because the
amplifier input resistance is zero. At the output the dependent current source delivers current
βi IA = IA to the load, again without current division because the output resistance of the buffer is
infinite. A Norton equivalent circuit of the combined original Norton source and the buffer is an
ideal current source IA with infinite Norton resistance.
6.6 MEMORY BANK
A memory bank is a logical unit of storage in electronics, which is hardware dependent. In
a computer the memory bank may be determined by the memory access controller along with
physical organization of the hardware memory slots.
In a typical synchronous dynamic random-access memory (SDRAM) or double data rate
synchronous dynamic random-access memory (DDR SDRAM), a bank consists of multiple rows
and columns of storage units and is usually spread out across several chips. In a single read or
write operation, only one bank is accessed, therefore bits in a column or a row, per bank, per chip
= memory bus width in bits (single channel). The size of a bank is further determined by bits in a
column and a row, per chip× number of chips in a bank.
49
Figure 6.11 RTL Of Memory Bank
Some computers have several identical memory banks of RAM, and use bank switching to switch
between them. Harvard architecture computers have (at least) 2 very different banks of memory,
one for program storage and one for data storage.
50
Figure 6.12 Simulation Of Memory Bank
51
CHAPTER 7
RESULTS AND CONCLUSIONS
7.1 POWER CONSUMED WHEN ALL 8 BANKS ARE ON
7.1.1 Project
Table 7.1 Project
7.1.2 Device
Table 7.2 Device
52
7.1.3 Environment
Table 7.3 Environment
7.1.4 Default Activity
Table 7.4 Default Activity
53
7.1.5 On-Chip Power Summary
Table 7.5 On-Chip Power Summary
7.1.6 Thermal Summary
Table 7.6 Thermal Summary
7.1.7 Power Supply Summary
Table 7.7 Power Supply Summary
54
Table 7.8 Power Supply Current
7.1.8 Confidence Level
Table 7.9 Confidence Level
55
7.1.9 By Hierarchy
Table 7.10 By Hierarchy
56
7.2 POWER CONSUMED WHEN ONLY ONE MEMORY BANK IS IN USE
7.2.1. Project
Table 7.11 Project
7.2.2 Device
Table 7.12 Device
57
7.2.3 Environment
Table 7.13 Environment
7.2.4 Default Activity Rates
Table 7.14 Default Activity
58
7.2.5 On-Chip Power Summary
Table 7.15 On-Chip Power Summary
7.2.6 Thermal Summary
Table 7.16 Thermal Summary
7.2.7 Power Supply Summary
Table 7.17 Power Supply Summary
59
Table 7.18 Power Supply Current
7.2.8 Confidence Level
Table 7.19 Confidence Level
60
7.2.9 By Hierarchy
Table 7.20 By Hierarchy
7.3 CONCLUSION
This project addresses the problem of finding a memory map for firm real-time workloads in
the context of SDRAM memory controllers. Existing controllers use either a static memory map
or provide only limited configurability. We use the number of banks requests are interleaved over
as flexible configuration parameter, while previous work considers it a fixed part of the controller
architecture. We use this degree of freedom to optimize the memory configuration to the mix of
applications and their requirements. This is beneficial for the worst-case performance in terms of
bandwidth, latency and power.
61
CHAPTER 8
FUTURE SCOPE
The advantages of this controller compared to SDR SDRAM, DDR1 SDRAM and DDR2
SDRAM is that it synchronizes the data transfer, and the data transfer is twice as fast as previous,
the production cost is also very low.
We have successfully designed using Verilog HDL and synthesized using Xilinx tool.
1. DDR4 SDRAM is the 4th generation of DDR SDRAM.
2. DDR3 SDRAM improves on DDR SDRAM by using differential signalling and lower voltages
to support significant performance advantages over DDR SDRAM.
3. DDR3 SDRAM standards are still being developed and improved.
62
REFERENCES
[1] C. van Berkel, “Multi-core for Mobile Phones,” in Proc. DATE, 2009.
[2] “International Technology Roadmap for Semiconductors (ITRS),” 2009.
[3] P. Kollig et al., “Heterogeneous Multi-Core Platform for Consumer Multimedia
Applications,” in Proc. DATE, 2009.
[4] L. Steffens et al., “Real-Time Analysis for Memory Access in Media Processing SoCs : A
Practical Approach,” Proc. ECRTS, 2008.
[5] S. Bayliss et al., “Methodology for designing statically scheduled application-specific
SDRAM controllers using constrained local search, “in Proc. FPT, 2009.
[6] B. Akesson et al., “Architectures and modelling of predictable memory controllers for
improved system integration,” in Proc. DATE, 2011.
[7] J. Reineke et al., “PRET DRAM Controller: Bank Privatization for Predictability and
Temporal Isolation,” in Proc. CODES+ISSS, 2011.
[8] M. Paolieri et al., “An Analyzable Memory Controller for Hard Real-Time CMPs,”
Embedded Systems Letters, IEEE, vol. 1, no. 4, 2009.
[9] Micron Technology Inc., “DDR3-800-1Gb SDRAM Datasheet, 02/10 EN edition,” 2006.
[10] D. Stiliadis et al., “Latency-rate servers: a general model for analysis of traffic scheduling
algorithms,” IEEE/ACM Trans. Netw., 1998. [11] B. Akesson et al., “Classification and Analysis
of Predictable Memory Patterns,” in Proc.RTCSA, 2010.
[12] DDR2 SDRAM Specification, JESD79-2E ed., JEDEC Solid State Technology
Association, 2008.
[13] DDR3 SDRAM Specification, JESD79-3D ed., JEDEC Solid State Technology
Association, 2009.
63
[14] K. Chandrasekar et al., “Improved Power Modelling of DDR SDRAMs,” in Proc. DSD,
2011.
[15] B. Akesson et al., “Automatic Generation of Efficient Predictable Memory Patterns,” in
Proc. RTCSA, 2011.

Contenu connexe

Tendances

Tendances (20)

FHSS- Frequency Hop Spread Spectrum
FHSS- Frequency Hop Spread SpectrumFHSS- Frequency Hop Spread Spectrum
FHSS- Frequency Hop Spread Spectrum
 
Arm instruction set
Arm instruction setArm instruction set
Arm instruction set
 
USART
USARTUSART
USART
 
Addressing modes of 8085
Addressing modes of 8085Addressing modes of 8085
Addressing modes of 8085
 
Instruction Set Architecture
Instruction Set ArchitectureInstruction Set Architecture
Instruction Set Architecture
 
Internal architecture-of-8086
Internal architecture-of-8086Internal architecture-of-8086
Internal architecture-of-8086
 
Leaky bucket algorithm
Leaky bucket algorithmLeaky bucket algorithm
Leaky bucket algorithm
 
UART
UARTUART
UART
 
Array multiplier
Array multiplierArray multiplier
Array multiplier
 
CS6003 AD HOC AND SENSOR NETWORKS
CS6003 AD HOC AND SENSOR NETWORKSCS6003 AD HOC AND SENSOR NETWORKS
CS6003 AD HOC AND SENSOR NETWORKS
 
Microprocessor 8085 complete
Microprocessor 8085 completeMicroprocessor 8085 complete
Microprocessor 8085 complete
 
VHDL- data types
VHDL- data typesVHDL- data types
VHDL- data types
 
User Datagram Protocol
User Datagram ProtocolUser Datagram Protocol
User Datagram Protocol
 
Scrambling
ScramblingScrambling
Scrambling
 
Flag register 8086 assignment
Flag register 8086 assignmentFlag register 8086 assignment
Flag register 8086 assignment
 
8253ppt
8253ppt8253ppt
8253ppt
 
I2C
I2CI2C
I2C
 
8051 serial communication
8051 serial communication8051 serial communication
8051 serial communication
 
8086 ppt
8086 ppt8086 ppt
8086 ppt
 
Floating point arithmetic operations (1)
Floating point arithmetic operations (1)Floating point arithmetic operations (1)
Floating point arithmetic operations (1)
 

Similaire à Memory map selection of real time sdram controller using verilog full project report

IRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDL
IRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDLIRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDL
IRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDLIRJET Journal
 
system on chip for telecommand system design
system on chip for telecommand system designsystem on chip for telecommand system design
system on chip for telecommand system designRaghavendra Badager
 
Modeling of DDR4 Memory and Advanced Verifications of DDR4 Memory Subsystem
Modeling of DDR4 Memory and Advanced Verifications of DDR4 Memory SubsystemModeling of DDR4 Memory and Advanced Verifications of DDR4 Memory Subsystem
Modeling of DDR4 Memory and Advanced Verifications of DDR4 Memory SubsystemIRJET Journal
 
z/VM Performance Analysis
z/VM Performance Analysisz/VM Performance Analysis
z/VM Performance AnalysisRodrigo Campos
 
Abstract The Prospect of 3D-IC
Abstract The Prospect of 3D-ICAbstract The Prospect of 3D-IC
Abstract The Prospect of 3D-ICvishnu murthy
 
Exploiting arm linux
Exploiting arm linuxExploiting arm linux
Exploiting arm linuxDan H
 
Oow2007 performance
Oow2007 performanceOow2007 performance
Oow2007 performanceRicky Zhu
 
Intelligent storage management solution using VMware vSphere 5.0 Storage DRS:...
Intelligent storage management solution using VMware vSphere 5.0 Storage DRS:...Intelligent storage management solution using VMware vSphere 5.0 Storage DRS:...
Intelligent storage management solution using VMware vSphere 5.0 Storage DRS:...IBM India Smarter Computing
 
Arm cortex (lpc 2148) based motor speed
Arm cortex (lpc 2148) based motor speedArm cortex (lpc 2148) based motor speed
Arm cortex (lpc 2148) based motor speedUday Wankar
 
Introduction to Real Time Java
Introduction to Real Time JavaIntroduction to Real Time Java
Introduction to Real Time JavaDeniz Oguz
 
Pivotal gem fire_twp_distributed-main-memory-platform_042313
Pivotal gem fire_twp_distributed-main-memory-platform_042313Pivotal gem fire_twp_distributed-main-memory-platform_042313
Pivotal gem fire_twp_distributed-main-memory-platform_042313EMC
 
Autonomous control in Big Data platforms: and experience with Cassandra
Autonomous control in Big Data platforms: and experience with CassandraAutonomous control in Big Data platforms: and experience with Cassandra
Autonomous control in Big Data platforms: and experience with CassandraEmiliano
 
Design, Validation and Correlation of Characterized SODIMM Modules Supporting...
Design, Validation and Correlation of Characterized SODIMM Modules Supporting...Design, Validation and Correlation of Characterized SODIMM Modules Supporting...
Design, Validation and Correlation of Characterized SODIMM Modules Supporting...IOSR Journals
 

Similaire à Memory map selection of real time sdram controller using verilog full project report (20)

IRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDL
IRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDLIRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDL
IRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDL
 
Adaptive bank management[1]
Adaptive bank management[1]Adaptive bank management[1]
Adaptive bank management[1]
 
system on chip for telecommand system design
system on chip for telecommand system designsystem on chip for telecommand system design
system on chip for telecommand system design
 
Modeling of DDR4 Memory and Advanced Verifications of DDR4 Memory Subsystem
Modeling of DDR4 Memory and Advanced Verifications of DDR4 Memory SubsystemModeling of DDR4 Memory and Advanced Verifications of DDR4 Memory Subsystem
Modeling of DDR4 Memory and Advanced Verifications of DDR4 Memory Subsystem
 
DDR DIMM Design
DDR DIMM DesignDDR DIMM Design
DDR DIMM Design
 
z/VM Performance Analysis
z/VM Performance Analysisz/VM Performance Analysis
z/VM Performance Analysis
 
RAMinate Invited Talk at NII
RAMinate Invited Talk at NIIRAMinate Invited Talk at NII
RAMinate Invited Talk at NII
 
Abstract The Prospect of 3D-IC
Abstract The Prospect of 3D-ICAbstract The Prospect of 3D-IC
Abstract The Prospect of 3D-IC
 
Exploiting arm linux
Exploiting arm linuxExploiting arm linux
Exploiting arm linux
 
IBM Power10.pdf
IBM Power10.pdfIBM Power10.pdf
IBM Power10.pdf
 
Rhel Tuningand Optimizationfor Oracle V11
Rhel Tuningand Optimizationfor Oracle V11Rhel Tuningand Optimizationfor Oracle V11
Rhel Tuningand Optimizationfor Oracle V11
 
Oow2007 performance
Oow2007 performanceOow2007 performance
Oow2007 performance
 
Intelligent storage management solution using VMware vSphere 5.0 Storage DRS:...
Intelligent storage management solution using VMware vSphere 5.0 Storage DRS:...Intelligent storage management solution using VMware vSphere 5.0 Storage DRS:...
Intelligent storage management solution using VMware vSphere 5.0 Storage DRS:...
 
Arm cortex (lpc 2148) based motor speed
Arm cortex (lpc 2148) based motor speedArm cortex (lpc 2148) based motor speed
Arm cortex (lpc 2148) based motor speed
 
dissertation
dissertationdissertation
dissertation
 
Introduction to Real Time Java
Introduction to Real Time JavaIntroduction to Real Time Java
Introduction to Real Time Java
 
Pivotal gem fire_twp_distributed-main-memory-platform_042313
Pivotal gem fire_twp_distributed-main-memory-platform_042313Pivotal gem fire_twp_distributed-main-memory-platform_042313
Pivotal gem fire_twp_distributed-main-memory-platform_042313
 
Autonomous control in Big Data platforms: and experience with Cassandra
Autonomous control in Big Data platforms: and experience with CassandraAutonomous control in Big Data platforms: and experience with Cassandra
Autonomous control in Big Data platforms: and experience with Cassandra
 
Design, Validation and Correlation of Characterized SODIMM Modules Supporting...
Design, Validation and Correlation of Characterized SODIMM Modules Supporting...Design, Validation and Correlation of Characterized SODIMM Modules Supporting...
Design, Validation and Correlation of Characterized SODIMM Modules Supporting...
 
Memperf
MemperfMemperf
Memperf
 

Plus de rahul kumar verma

SMU DRIVE FALL 2017 MBA 205 – Operation research solved free assignment
SMU DRIVE FALL 2017 MBA 205 – Operation research solved free assignment  SMU DRIVE FALL 2017 MBA 205 – Operation research solved free assignment
SMU DRIVE FALL 2017 MBA 205 – Operation research solved free assignment rahul kumar verma
 
SMU DRIVE FALL 2017 MBA 202 – financial management solved free assignment
SMU DRIVE FALL 2017 MBA 202 – financial management solved free assignmentSMU DRIVE FALL 2017 MBA 202 – financial management solved free assignment
SMU DRIVE FALL 2017 MBA 202 – financial management solved free assignmentrahul kumar verma
 
SMU DRIVE FALL 2017 MBA 204 – management information systems solved free assi...
SMU DRIVE FALL 2017 MBA 204 – management information systems solved free assi...SMU DRIVE FALL 2017 MBA 204 – management information systems solved free assi...
SMU DRIVE FALL 2017 MBA 204 – management information systems solved free assi...rahul kumar verma
 
SMU DRIVE SPRING 2017 MBA 103- Statistics for Management solved free assignment
SMU DRIVE SPRING 2017 MBA 103- Statistics for Management solved free assignmentSMU DRIVE SPRING 2017 MBA 103- Statistics for Management solved free assignment
SMU DRIVE SPRING 2017 MBA 103- Statistics for Management solved free assignmentrahul kumar verma
 
SMU DRIVE SPRING 2017 MBA101– Management Process and Organizational Behavio...
 SMU DRIVE SPRING 2017  MBA101– Management Process and Organizational Behavio... SMU DRIVE SPRING 2017  MBA101– Management Process and Organizational Behavio...
SMU DRIVE SPRING 2017 MBA101– Management Process and Organizational Behavio...rahul kumar verma
 
SMU DRIVE SPRING 2017 MBA105 - MANAGERIAL ECONOMICS free solved assignment
 SMU DRIVE SPRING 2017  MBA105 - MANAGERIAL ECONOMICS free solved assignment SMU DRIVE SPRING 2017  MBA105 - MANAGERIAL ECONOMICS free solved assignment
SMU DRIVE SPRING 2017 MBA105 - MANAGERIAL ECONOMICS free solved assignmentrahul kumar verma
 
SMU DRIVE SPRING 2017 MBA106 –Human Resource Management free solved assignment
SMU DRIVE SPRING 2017  MBA106 –Human Resource Management free solved assignmentSMU DRIVE SPRING 2017  MBA106 –Human Resource Management free solved assignment
SMU DRIVE SPRING 2017 MBA106 –Human Resource Management free solved assignmentrahul kumar verma
 
SMU DRIVE SPRING 2017 MBA102 - Business Communication free solved assignment
SMU DRIVE SPRING 2017  MBA102 - Business Communication free solved assignment SMU DRIVE SPRING 2017  MBA102 - Business Communication free solved assignment
SMU DRIVE SPRING 2017 MBA102 - Business Communication free solved assignment rahul kumar verma
 
FPGA in outer space seminar report
FPGA in outer space seminar reportFPGA in outer space seminar report
FPGA in outer space seminar reportrahul kumar verma
 

Plus de rahul kumar verma (11)

SMU DRIVE FALL 2017 MBA 205 – Operation research solved free assignment
SMU DRIVE FALL 2017 MBA 205 – Operation research solved free assignment  SMU DRIVE FALL 2017 MBA 205 – Operation research solved free assignment
SMU DRIVE FALL 2017 MBA 205 – Operation research solved free assignment
 
SMU DRIVE FALL 2017 MBA 202 – financial management solved free assignment
SMU DRIVE FALL 2017 MBA 202 – financial management solved free assignmentSMU DRIVE FALL 2017 MBA 202 – financial management solved free assignment
SMU DRIVE FALL 2017 MBA 202 – financial management solved free assignment
 
SMU DRIVE FALL 2017 MBA 204 – management information systems solved free assi...
SMU DRIVE FALL 2017 MBA 204 – management information systems solved free assi...SMU DRIVE FALL 2017 MBA 204 – management information systems solved free assi...
SMU DRIVE FALL 2017 MBA 204 – management information systems solved free assi...
 
SMU DRIVE SPRING 2017 MBA 103- Statistics for Management solved free assignment
SMU DRIVE SPRING 2017 MBA 103- Statistics for Management solved free assignmentSMU DRIVE SPRING 2017 MBA 103- Statistics for Management solved free assignment
SMU DRIVE SPRING 2017 MBA 103- Statistics for Management solved free assignment
 
SMU DRIVE SPRING 2017 MBA101– Management Process and Organizational Behavio...
 SMU DRIVE SPRING 2017  MBA101– Management Process and Organizational Behavio... SMU DRIVE SPRING 2017  MBA101– Management Process and Organizational Behavio...
SMU DRIVE SPRING 2017 MBA101– Management Process and Organizational Behavio...
 
SMU DRIVE SPRING 2017 MBA105 - MANAGERIAL ECONOMICS free solved assignment
 SMU DRIVE SPRING 2017  MBA105 - MANAGERIAL ECONOMICS free solved assignment SMU DRIVE SPRING 2017  MBA105 - MANAGERIAL ECONOMICS free solved assignment
SMU DRIVE SPRING 2017 MBA105 - MANAGERIAL ECONOMICS free solved assignment
 
SMU DRIVE SPRING 2017 MBA106 –Human Resource Management free solved assignment
SMU DRIVE SPRING 2017  MBA106 –Human Resource Management free solved assignmentSMU DRIVE SPRING 2017  MBA106 –Human Resource Management free solved assignment
SMU DRIVE SPRING 2017 MBA106 –Human Resource Management free solved assignment
 
SMU DRIVE SPRING 2017 MBA102 - Business Communication free solved assignment
SMU DRIVE SPRING 2017  MBA102 - Business Communication free solved assignment SMU DRIVE SPRING 2017  MBA102 - Business Communication free solved assignment
SMU DRIVE SPRING 2017 MBA102 - Business Communication free solved assignment
 
CCNA DUMPS 640-802
CCNA DUMPS 640-802CCNA DUMPS 640-802
CCNA DUMPS 640-802
 
CCNA DUMPS 200-120
CCNA DUMPS 200-120CCNA DUMPS 200-120
CCNA DUMPS 200-120
 
FPGA in outer space seminar report
FPGA in outer space seminar reportFPGA in outer space seminar report
FPGA in outer space seminar report
 

Dernier

Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Comparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization TechniquesComparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization Techniquesugginaramesh
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 

Dernier (20)

Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Comparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization TechniquesComparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization Techniques
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 

Memory map selection of real time sdram controller using verilog full project report

  • 1. Project Report On Memory map selection of real time SDRAM controller using Verilog By RAHUL VERMA (9015694258)
  • 2. vi TABLE OF CONTENTS Page DECLARATION ............................................................................................................................ii CERTIFICATE .............................................................................................................................iii ACKNOWLEDGEMENTS ..........................................................................................................iv ABSTRACT ..................................................................................................................................vi LIST OF FIGURES .....................................................................................................................vii LIST OF TABLES........................................................................................................................viii LIST OF ABBREVIATION……………………………………………………………………...ix CHAPTER 1 (INTRODUCTION)……………………………………………………………..01 1.1 LITERATURE SURVEY……………………………………………………...02 1.2 GOAL OF THE PROJECT…………………………………………………….03 CHAPTER 2 (BACKGROUND)………………………………………………………………04 2.1 RANDOM ACCESS MEMORY…………………………………........ ……..04 2.2 STATIC RANDOM ACCESS MEMORY …………………………………....04 2.3 DYNAMIC RANDOM ACCESS MEMORY ……………..………………….05 2.4 DEVELOPMENT OF DRAM ………………………………………………...06 2.4.1 DRAM …………………………………………………………………...07 2.4.2 SYNCHRONOUS DRAM……………………………………………….07 2.4.3 DDR1SDRAM…………………………………………………………....08 2.4.4 DDR2SDRAM……………………………………………………………08
  • 3. vii 2.4.5 DDR3SDRAM………………………………………………………..…09 2.5 TIMELINE……………………………………………………………………09 CHAPTER 3 (METHODOLOGY)…………………………………………………………...11 3.1 HARDWARE…………………………………………………………………11 3.1.1 VIRTEX-6FPGA………………………………………………………..11 3.1.2 ML605 BOARD………………………………………………………...12 3.2 TOOLS………………………………………………………………………..12 3.2.1 XILINX INTERGRATED SOFTWARE ENVIRONMENT(ISE)……..13 3.2.2 SYNTHESIS AND SIMULATION……………………. ……………..14 3.2.3 IMPLEMENTATION AND HARDWARE VALIDATION…………...14 3.2.4 ANALYSIS OF TURN-AROUND TIMES…………………………….17 3.2.5 XILINX CORE GENERATOR…………………………………………19 CHAPTER 4 (ARCHITECTURE)……………………………………………………………20 4.1 CONTROL INTERFACE MODULE…………………………………………21 4.2 COMMAND MODULE…………………….……………….………………...22 4.3 DAPATH MODULE…………………………………………………………..24 CHAPTER 5 (OPERATION).....................................................................................................25 5.1 SDRAM OVERVIEW…………………………………………………………26 5.2 FUNCTIONAL DESCRIPTION………………………………………………27 5.3 SDRAM CONTROLLER COMMAND INTERFACE……………………….28 5.3.1 NOP COMMAND……………………………………………………….29 5.3.2 READA COMMAND…………………………………………………...30 5.3.3 WRITEA COMMAND……………………………………………….…31 5.3.4 REFRESH COMMAND…………………………………………….…..32
  • 4. viii 5.3.5 PRECHARGE COMMAND………………………………………….....34 5.3.6 LOAD_MODE COMMAND……………………………………………35 5.3.7 LOAD_REG1 COMMAND……………………………………………..36 5.3.8 LOAD_REG2 COMMAND……………………………………………..37 CHAPTER 6 (ELEMENTS OF MEMORY BANK)…………………………………………38 6.1 DECODER…………………………………………………………………….38 6.1.1 A 2 TO 4 SINGLE BIT DECODER…………………………………….38 6.2 DEMUX………………………………………………………………………..40 6.3 RAM…………………………………………………………………………...41 6.3.1 TYPES OF RAM………………………………………………………...42 6.4 MUX…………………………………………………………………………...44 6.5 BUFFER……………………………………………………………………….45 6.5.1 VOLTAGE BUFFER…………………………………………………….46 6.5.2 CURRENT BUFFER…………………………………………………….47 6.6 MEMORY BANK……………………………………………………………..48 CHAPTER 7 (RESULT AND CONCLUSIONS)……………………………………………..51 7.1 POWER CONSUMED WHEN ALL 8 BANKS ARE ON…………..………51 7.1.1 PROJECT………………………………………………………...………51 7.1.2 DEVICE ………………………………………………………………….51 7.1.3 ENVIRONMENT …………………………………………………….,,,,.52 7.1.4 DEFAULT ACTIVITY………...………………………………….……..52 7.1.5 ON-CHIP POWER SUMMARY………………………………………...53 7.1.6 THERMAL SUMMARY………………………………………………...53 7.1.7 POWER SUPPLY SUMMARY………………………………………….53 7.1.8 CONFIDENCE LEVEL………………………………………………….54 7.1.9 BY HIERARCHY………………………………………………………..55
  • 5. ix 7.2 POWER CONSUMED WHEN ONLY ONE MEMORY BANK IS IN USE…..56 7.2.1. PROJECT………………………………………………………………..56 7.2.2 DEVICE……………………………………………………………….....56 7.2.3 ENVIRONMENT………………………………………………………...57 7.2.4 DEFAULT ACTIVITY RATES…………………………………………57 7.2.5 ON-CHIP POWER SUMMARY………………………………………...58 7.2.6 THERMAL SUMMARY………………………………………………...58 7.2.7 POWER SUPPLY SUMMARY……………………………………...….58 7.2.8 CONFIDENCE LEVEL………………………………………………….59 7.2.9 BY HIERARCHY………………………………………………………..60 7.3 CONCLUSION…………………………………………………………….….60 CHAPTER 8 (FUTURE SCOPE)……………………………………………………………...61 REFERENCES...............................................................................................................................62
  • 6. x LIST OF FIGURES Page Figure 2.1 DRAM Row Access Latency vs. Year 09 Figure 2.2 DRAM Column Address Time vs. Year 10 Figure 3.1 Screenshot of ISE Project Navigator 13 Figure 3.2 Flow Chart and Timing for Simulation and Hardware Validation 15 Figure 3.3 ISim Screen Shot 18 Figure 3.4 CHIPSCOPE Screen Shot 19 Figure 4.0 Architecture of SDRAM controller 20 Figure 4.1 Control Interface Module 21 Figure 4.2 Command Module Block Diagram 23 Figure 4.3 Data Path Module 24 Figure 5.0 SDR SDRAM Controller System-Level Diagram 25 Figure 5.1 Timing diagram for a READA command 30 Figure 5.2 Timing diagram for a WRITEA command 31 Figure 5.3 Timing diagram for a REFRESH command 32 Figure 5.4Timing diagram for a PRECHARGE command 34 Figure 5.5 Timing diagram for a PRECHARGE command 35 Figure 6.1 RTL of decoder 39
  • 7. xi Figure 6.2 Simulation of Decoder 40 Figure 6.3 RTL of DEMUX 41 Figure 6.4 Simulation Of DEMUX 42 Figure 6.5 RTL of RAM 44 Figure 6.6 Simulation of RAM 44 Figure 6.7 RTL of MUX 46 Figure 6.8 Simulation of MUX 46 Figure 6.9 RTL of Buffer 48 Figure 6.10 Simulation of Buffer 49 Figure 6.11 RTL of Memory Bank 50 Figure 6.12 Simulation of Memory Bank 50
  • 8. xii LIST OF TABLES Page Table 5.1 SDRAM Bus Commands 26 Table 5.2 Interface Signals 28 Table 5.3 Interface Commands 29 Table 5.4 REG1 Bit Definitions 36 Table 7.1 Project 51 Table 7.2 Device 51 Table 7.3 Environment 52 Table 7.4 Default Activity 52 Table 7.5 On-Chip Power Summary 53 Table 7.6 Thermal Summary 53 Table 7.7 Power Supply Summary 53 Table 7.8 Power Supply Current 54 Table 7.9 Confidence Level 54 Table 7.10 By Hierarchy 55 Table 7.11 Project 56 Table 7.12 Device 56
  • 9. xiii Table 7.13 Environment 57 Table 7.14 Default Activity 57 Table 7.15 On-Chip Power Summary 58 Table 7.16 Thermal Summary 58 Table 7.17 Power Supply Summary 58 Table 7.18 Power Supply Current 59 Table 7.19 Confidence Level 59 Table 7.20 By Hierarchy 60
  • 10. xiv LIST OF ABBREVIATIONS A/D Analog To Digital CAS Column Address Strobing CLB Configurable Logic Block DRAM Dynamic Random-Access Memory FPGA Field-Programmable Gate Array ISE Integrated Software Environment I/O Input/ Output LUTs Look-Up Tables NCD Native Circuit Description RAM Random Access Memory RAS Row Address Strobing ROM Read Only Memory SDRAM Synchronous Dynamic Random-Access Memory SRAM Static Random-Access Memory XST Xilinx Synthesis Technology
  • 11. 1 CHAPTER 1 INTRODUCTION Embedded applications with real-time requirements are mapped to heterogeneous multiprocessor systems. The computational demands placed upon these systems are continuously increasing, while power and area budgets limit the amount of resources that can be expended to reduce costs, applications are often forced to share hardware resources. Functional correctness for Real- Time application is only guaranteed if their timing requirements are considered throughout the entire system when the requirements are not met, it may cause an unacceptable loss of functionality or severe quality degradation. We focus on the real-time properties of the (off-chip) memory. SDRAM is a commonly used memory type because it provides a large amount of storage space at low cost per bit. It comprises a hierarchical structure of banks and rows that have to be opened and closed explicitly by the memory controller, where only one row in each bank can be open at a time. Requests to the open row are served at a low latency, while request to a different row results in a high latency, since it requires closing the open row and subsequent opening of the requested row. Locality thus strongly influences the performance of the memory subsystem. The worst-case (minimum) bandwidth and worst-case (maxi- mum) latency are determined by the way requests are mapped to the memory. The worst-case latency can be optimized by accessing the memory at a small granularity (i.e. few words), such that the individual requests take a small amount of time to complete. This allows fine-grained sharing of the memory resource, at the expense of efficiency, since the overhead of opening and closing rows is amortized over only a small number of bits. Latency sensitive requests like cache misses favor this configuration. Conversely, to optimize for bandwidth, the memory has to be used as efficiently as possible, which requires memory maps that use a large access granularity.
  • 12. 2 Existing memory controllers offer only limited configurability of the memory mapping and are unable to balance this trade-off based on the application requirements .A memory controller must take the latency and bandwidth requirements of all of its applications into account, while staying within the given power budget. This requires an understanding of the effect that different memory maps have on the attainable worst-case bandwidth, latency and power. 1.1 LITERATURE SURVEY Synchronous DRAM (SDRAM) has become a mainstream memory of choice in embedded system memory design due to its speed, burst access and pipeline features. For high-end applications using processors such as Motorola MPC 8260 or Intel StrongArm, the interface to the SDRAM is supported by the processor’s built-in peripheral module. However, for other applications, the system designer must design a controller to provide proper commands for SDRAM initialization, read/write accesses and memory refresh. In some cases, SDRAM is chosen because the previous generations of DRAM (FP and EDO) are either end-of-life or not recommended for new designs by the memory vendors. From the board design point of view, design using earlier generations of DRAM is much easier and more straightforward than using SDRAM unless the system bus master provides the SDRAM interface module as mentioned above. This SDRAM controller reference design, located between the SDRAM and the bus master, reduces the user’s effort to deal with the SDRAM command interface by providing a simple generic system interface to the bus master. In today's SDRAM market, there are two major types of SDRAM distinguished by their data transfer rates. The most common single data rate (SDR) SDRAM transfers data on the rising edge of the clock. The other is the double data rate (DDR) SDRAM which transfers data on both the rising and falling edge to double the data transfer throughput. Other than the data transfer phase, the different power-on initialization and mode register definitions, these two SDRAMs share the same command set and basic design concepts. This reference design is targeted for SDR SDRAM, however, due to the similarity of SDR and DDR SDRAM, this design can also be modified for a DDR SDRAM controller.
  • 13. 3 For illustration purposes, the Micron SDR SDRAM MT48LC32M4A2 (8Meg x 4 x 4 banks) is chosen for this design. Also, this design has been verified by using Micron’s simulation model. It is highly recommended to download the simulation model from the SDRAM vendors for timing simulation when any modifications are made to this design. Several SDRAM controllers focusing on real-time applications have been proposed, all trying to maximize the worst case performance. Uses a static command schedule computed at design time. Full knowledge of the application behavior is thus required, making it unable to deal with dynamism in the request streams. The controller proposed in dynamically schedules pre- computed sequences of SDRAM commands according to a fixed set of scheduling rules. The controller proposed in follows a similar approach. Dynamically schedules commands at run-time according to a set of rules from which an upper bound on the latency of a request is determined and use a memory map that always interleaves requests over all banks in the SDRAM, which sets a high lower bound on the smallest request size that can be supported efficiently. Supports multiple bursts to each bank in an access to increase guaranteed bandwidth for large requests. Allows only single burst accesses to all banks in a fixed sequential manner, although multiple banks can be clustered to create a single logical resource. None of the mentioned controllers take power into account, despite it being an increasingly important design constraint. 1.1 GOAL OF THE PROJECT 1) We explore the full memory map design space by allowing requests to be interleaved over a variable number of banks. This reduces the minimum access granularity and can thus be beneficial for applications with small requests or tight latency constraints. 2) We propose a configuration methodology that is aware of the real-time and power constraints, such that an optimal memory map can be selected.
  • 14. 4 CHAPTER 2 BACKGROUND There are two different types of random access memory: synchronous and dynamic. Synchronous random access memory (SRAM) is used for high-speed, low power applications while dynamic random access memory (DRAM) is used for its low cost and high density. Designers have been working to make DRAM faster and more energy efficient .The following sections will discuss the differences between these two types of RAM, as well as present the progression of DRAM towards a faster, more energy efficient design. 2.1 RANDOM ACCESS MEMORY Today, the most common type of memory used in digital systems is random access memory (RAM). The time it takes to access RAM is not affected by the data’s location in memory. RAM is volatile, meaning if power is removed, then the stored data is lost. As a result, RAM cannot be used for permanent storage. However, RAM is used during runtime to quickly store and retrieve data that is being operated on by a computer. In contrast, nonvolatile memory, such as hard disks, can be used for storing data even when not powered on. Unfortunately, it takes much longer for the computer to store and access data from this memory. There are two types of RAM: static and dynamic. In the following sections the differences between the two types and the evolution of DRAM will be discussed. 2.2 STATIC RANDOM ACCESS MEMORY Static random access memory (SRAM) stores data as long as power is being supplied to the chip.
  • 15. 5 Each memory cell of SRAM stores one bit of data using six transistors: a flip flop and two access transistors (i.e. four transistors). SRAM is the faster of the two types of RAM because it does not involve capacitors, which involve sense amplification of a small charge. For this reason, it is used in cache memory of computers. Additionally, SRAM requires a very small amount of power to maintain its data in standby mode Although SRAM is fast and energy efficient it is also expensive due to the amount of silicon needed for its large cell size. This presented the need for a denser memory cell, which brought about DRAM. 2.3 DYNAMIC RANDOM ACCESS MEMORY According to Wakerly , “In order to build RAMs with higher density (more bits per chip), chip designers invented memory cells that use as little as one transistor per bit. Each DRAM cell consists of one transistor and a capacitor. Since capacitors “leak” or lose charge over time, DRAM must have a refresh cycle to prevent data loss. According to a high-performance DRAM study on earlier versions of DRAM, DRAM’s refresh cycle is one reason DRAM is slower than SRAM. The cells of DRAM use sense amplifiers to transmit data to the output buffer in the case of a read and transmit data back to the memory cell in the case of a refresh. During a refresh cycle, the sense amplifier reads the degraded value on a capacitor into a D- Latch and writes back the same value to the capacitor so it is charged correctly for 1 or 0. Since all rows of memory must be refreshed and the sense amplifier must determine the value of a, already small, degenerated capacitance, refresh takes a significant amount of time. The refresh cycle typically occurs about every 64 milliseconds the refresh rate of the latest DRAM (DDR3) is about 1 microsecond. Although refresh increases memory access time, according to a high-performance DRAM study on earlier versions of DRAM, the greatest amount of time is lost during row addressing, more specifically, “[extracting] the required data from the sense amps/row caches” . During addressing, the memory controller first strobes the row address (RAS) onto the address bus. Once the RAS is sent, a sense amplifier (one for each cell in the row) determines if a charge indicating a 1 or 0 is loaded into each capacitor.
  • 16. 6 This step is long because “the sense amplifier has to read a very weak charge” and “the row is formed by the gates of memory cells.” The controller then chooses a cell in the row from which to read from by strobing the column address (CAS) onto the address bus. A write requires the enable signal to be asserted at the same time as the CAS, while a read requires the enable signal to be de-asserted. The time it takes the data to move onto the bus after the CAS is called the CAS latency. Although recent generations of DRAM are still slower than SRAM, DRAM is used when a largeramount of memory is required since it is less expensive. For example, in embedded systems, a small block of SRAM is used for the critical data path, and a large block of DRAM is used to satisfy all other needs .The following section will discuss the development of DRAM into a faster, more energy efficient memory. 2.4 DEVELOPMENT OF DRAM Many factors are considered in the development of high performance RAM. Ideally, the developer would always like memory to transfer more data and respond in less time; memory would have higher bandwidth and lower latency. However, improving upon one factor often involves sacrificing the other. Bandwidth is the amount of data transferred per second. It depends on the width of the data bus and the frequency at which data is being transferred. Latency is the time between when the address strobe is sent to memory and when the data is placed on the data bus. DRAM is slower than SRAM because it periodically disables the refresh cycle and because it takes a much longer time to extract data onto the memory bus. Advancements have been, however, to several different aspects of DRAM to increase bandwidth and decrease latency. Over time, DRAM has evolved to become faster and more energy efficient by decreasing in cell size and increasing in capacity. In the following section, we will look at different types of DRAM and how DDR3 memory has come to be.
  • 17. 7 2.4.1 DRAM One of the reasons the original DRAM was very slow is because of extensive addressing overhead. In the original DRAM, an address was required for every 64-bit access to memory. Each access took six clock cycles. For a four 64-bit access to consecutive addresses in memory, the notation for timing was 6-6-6-6. Dashes separate memory accesses and the numbers indicate how long the accesses take. This DRAM timing example took 24 cycles to access the memory four times. In contrast, more recent DRAM implements burst technology which can send many 64-bit words toconsecutive addresses. While the first access still takes six clock cycles due memory accessing, the next three adjacent addresses can be performed in as little as one clock cycle since the addressing does not need to be repeated. During burst mode, the timing would be 6-1-1-1, a total of nine clock cycles. The original DRAM is also slower than its descendants because it is asynchronous. This means there is no memory bus clock to synchronize the input and output signals of the memory chip. The timing specifications are not based on a clock edge, but rather on maximum and minimum timing values (in seconds). The user would need to worry about designing a state machine with idle states, which may be inconsistent when running the memory at different frequencies. 2.4.2 Synchronous DRAM In order to decrease latency, SDRAM utilizes a memory bus clock to synchronize signals to and from the system and memory. Synchronization ensures that the memory controller does not need to follow strict timing; it simplifies the implemented logic and reduces memory access latency. With a synchronous bus, data is available at each clock cycle. SDRAM divides memory into two to four banks for concurrent access to different parts of memory.Simultaneous access allows continuous data flow by ensuring there will always be a memory bank read for access. The addition of banks adds another segment to the addressing, resulting in a bank, row and column address.
  • 18. 8 The memory controller determines if an access addresses the same bank and row as the previous access, so only a column address strobe must be sent. This allows the access to occur much more quickly and can decrease overall latency. 2.4.3 DDR1 SDRAM DDR1 SDRAM (i.e. first generation of SDRAM) doubles the data rate (hence the term DDR) of SDRAM without changing clock speed or frequency. DDR transfers data on both the rising and falling edge of the clock, has a pre-fetch buffer and low voltage signaling, which makes it more energy efficient than previous designs. Unlike SDRAM, which transfers 1 bit per clock cycle from the memory array to the data queue, DDR1 transfers 2 bits to the queue in two separate pipelines. The bits are released in order on the same output line. This is called a 2n-prefetch architecture. In addition, DDR1 utilizes double transition clocking by triggering on both the rising and falling edge of the clock to transfer data. As a result, the bandwidth of DDR1 is doubled without an increase in the clock frequency. In addition to doubling the bandwidth, DDR1 made advances is energy efficiency. DDR1 can operate at 2.5V instead of the 3.3V operating point of SDRAM thanks to low voltage signaling technology. 2.4.4 DDR2 SDRAM Data rates of DDR2 SDRAM are up to eight times more than original SDRAM. At an operation voltage of1.8V, it achieves lower power consumption than DDR1. DDR2 SDRAM has a 4-bit prefetch buffer, an improvement from the DDR12-bit prefetch. This means that 4 bits are transferred per clock cycle from the memory array to the data bus, which increases bandwidth.
  • 19. 9 2.4.5 DDR3 SDRAM DDR3 provides two burst modes for both reading and writing: burst chop (BC4) and burst length eight (BL8). BC4 allows bursts of four by treating data as though half of it is masked. This creates smooth transitioning if switching from DDR2 to DDR3 memory. However, burst mode BL8 is the primary burst mode. BL8 allows the most data to be transferred in the least amount of time; it transfers the greatest number of 64-bit data packets (eight) to or from consecutive addresses in memory, which means addressing occurs once for every eight data packets sent. In order to support a burst length of eight data packets, DDR3 SDRAM has an 8- bit prefetch buffer.DDR3, like its predecessors, not only improves upon bandwidth, but also energy conservation.Power consumption of DDR3 can be up to 30 percent less than DDR2. The DDR3 operating voltage is the lowest yet, at 1.5 V, and low voltage versions are supported at voltages of 1.35 V. 2.5 TIMELINE Ideally, memory performance would improve at the same rate as central processing unit (CPU) performance. However, memory latency has only improved about five percent each year . The longest latency (RAS latency) of the newest release of DRAM for each year is shown in the plot in Figure 2.1. Figure 2.1 DRAM Row Access Latency vs. Year
  • 20. 10 As seen in Figure 2.1, the row access latency decreases linearly with every new release of DRAM until 1996. Once SDRAM is released in 1996, the difference in latency from year to year is much smaller. With recent memory releases it is much more difficult to reduce RAS latency. This can be seen especially for DDR2 and DDR3 memory releases 2006 to 2012.CAS latency, unlike RAS latency, consistently decreases (bandwidth increases) with every memory release, and in the new DDR3 memory, is very close to 0 ns. Figure 2.2 shows the column access latency. Figure 2.2 DRAM Column Address Time vs. Year Looking at some prominent areas of the CAS graph, it can be seen in Figure 2.2 that bandwidth greatly increased (CAS decreased) from 1983 to 1986. This is due to the switch from NMOS DRAMs to CMOS DRAMs. In1996 the first SDRAM was released. The CAS latency decreased (bandwidth increased) due to synchronization and banking. In later years, the CAS latency does not decrease by much, but this is expected since the latency is already much smaller. Comparing Figure 2.2 to Figure 2.1, CAS time decreases much more drastically than RAS time. This means the bandwidth greatly improves, while latency improves much more slowly. In 2010, when DDR2 was released, it can be seen that latency was sacrificed (Figure 2.1) for an increase in bandwidth (Figure 2.2).
  • 21. 11 CHAPTER 3 METHODOLOGY In this section the ML605 and Virtex-6 board hardware is described as well as the tools utilized for design and validation. The Xilinx Integrated Software Environment (ISE) was used for design and iSim and ChipScope were used for validation in simulation and in hardware. 3.1 HARDWARE 3.1.1 Virtex-6FPGA The Virtex-6 FPGA (XC6VLX240T) is used to implement the arbiter. This FPGA has 241, 152 logic cells and is organized into banks (40 pins per bank). These logic cells, or slices, are composed of four look-up tables (LUTs), multiplexers and arithmetic carry logic. LUTs implement Boolean functions, and multiplexers enable combinatorial logic. Two slices form a configurable logic block (CLB). In order to distribute a clock signal to all these logic blocks, the FPGA has five types of clock lines: BUFG, BUFR, BUFIO, BUFH, and high- performance clock. These lines satisfy “requirements of high fan out, short propagation delay, and extremely low skew”. The clock lines are also split into categories depending on the sections of the FPGA and components they drive. The three categories are: global, regional, and I/O lines. Global clock lines drive all flip-flops, clock enables, and many logic inputs. Regional clock lines drive all clock destinations in their region and two bordering regions. There are six to eighteen regions in an FPGA. Finally, I/O clock lines are very fast and only drive I/O logic and serializer/deserializer circuits.
  • 22. 12 3.1.2 ML605 Board The Virtex-6 FPGA is included on the ML605 Development Board. In addition to the FPGA, the development board includes a 512 MB DDR3 small outline dual inline memory module (SODIMM), which our design arbitrates access to. A SODIMM is the type of board the memory is manufactured on .The FPGA also includes 32 MB of linear BPI Flash and 8 Kb of IIC EEPROM. Communication mechanisms provided on the board include Ethernet, SFP transceiver connector, GTX port, USB to UART Bridge, USB host and peripheral port, and PCI Express. The only connection used during this project was the USB JTAG connector. It was used to program and debug the FPGA from the host computer. There are three clock sources on the board: a 200 MHz differential oscillator, 66 MHz single- ended oscillator and SMA connectors for an external clock. This project utilizes the 200MHz oscillator. Peripherals on the ML605 board were useful for debugging purposes. The push buttons were used to trigger sections of code execution in ChipScope such as reading and writing from memory. Dip switches acted as configuration inputs to our code. For example, they acted as a safety to ensure the buttons on the board were not automatically set to active when the code was downloaded to the board. In addition, the value on the switches indicated which system would begin writing first for debugging purposes. LEDs were used to check functionality of sections of code as well, and for additional validation, they can be used to indicate if an error as occurred. Although we did not use it, the ML605 board provides an LCD. 3.2 TOOLS Now that the hardware where the design is placed is described, the software used to manipulate the design can be described. The tools for design include those provided within Xilinx Integrated Software Environment, and the tools used for validation include iSim and ChipScope. This looks at the turn-around time for both validation tools and what it means for the design process.
  • 23. 13 3.2.1 Xilinx Integrated Software Environment (ISE) We designed the arbiter using Verilog hardware description language in Xilinx Integrated Software Environment (ISE). ISE is an environment in which the user can “take [their] design from design entry through Xilinx device programming”. The main workbench for ISE is ISE Project Navigator. The Project Navigator tool allows the user to effectively manage their design and call upon development processes. In Figure 3.1, a screen shot of ISE Project Navigator : Figure 3.1 Screen Shot of ISE Project Navigator Figure 3.1 shows some main windows in ISE Project Navigator. On the right hand side is the window for code entry. The hierarchal view of modules in the design appears on the left, and when implementation is selected from the top, the design implementation progress is shown in the bottom window. If simulation were selected instead of implementation there would be an option to run the design for simulation. The main processes called upon by ISE are synthesis, implementation, and bit stream generation. During synthesis, Xilinx Synthesis Technology (XST) is called upon. XST synthesizes Verilog, VHDL or mixed language designs and creates netlist files. Netlist files, or NGC files, contain the design logic and constraints.
  • 24. 14 They are saved for use in the implementation process. During synthesis, the XST checks for synthesis errors (parsing) and infers macros from the code. When the XST infers macros it recognizes parts of the code that can be replaced with components in its library such as MUXes, RAM encodes them in a way that would be best for reduced area and/or increased speed. Implementation is the longest process to perform on the design. The first step of implementation is to combine the netlists and constraints into a design/NGD file. The NGD file is the design file reduced to Xilinx primitives. This process is called translation. During the second step, mapping, the design is fitted into the target device. This involves turning logic into FPGA elements such as configurable logic blocks. Mapping produces a native circuit description (NCD) file. The third step, place and route, uses the mapped NCD file to place the design and route timing constraints. Finally, the program file is generated and, at the finish of this step, a bit stream is ready to be downloaded to the board. 3.2.2 Synthesis and Simulation Once the design has been synthesized, simulation of the design is possible. Simulating a design enables verification of logic functionality and timing. We used simulation tool in ISE (isim) to view timing and signal values. In order to utilize isim, we created a test bench to provide the design with stimulus. Since simulation only requires design synthesis, it is a relatively fast process. The short turn-around time of simulation means we were able to iteratively test small changes to the design and, therefore, debug our code efficiently. 3.2.3 Implementation and Hardware VALIDATION Once the design was working in simulation, we still needed to test the design’s functionality in hardware. Testing the design in hardware is the most reliable validation method. In order to download the design to the board, it first needs to be implemented in ISE.
  • 25. 15 Implementation has a much longer turn- around time than synthesis, so while functionality in hardware ensures the design is working, simulation is the practical choice for iterative verification. In order to test our design in hardware, we utilized ChipScope Pro Analyzer, a GUI which allows the user to “configure [their] device, choose triggers, setup the console, and view results of the capture on the fly”. In order to use ChipsScope Pro, you may either insert ChipScope Pro Cores into the design using the Core Generator, a tool that can be accessed in ISE Project Figure 3.2 Flow Chart and Timing for Simulation and Hardware Validation Navigator, or utilize the Plan Ahead or Core Inserter tool, which automatically inserts cores into the design netlist for you. One method of inserting ChipScope cores into the design is by utilizing Plan Ahead software. The Plan Ahead tool enables the creation of floorplans.
  • 26. 16 Floorplans provide an initial view of “the design’s interconnect flow and logic module sizes. This helps the designer to “avoid timing, utilization, and routing congestion issues. Plan Ahead also allows the designer to create and configure I/O ports and analyze implementation results, which aids in the discovery of bottlenecks in the design. For our project, however, we utilized Plan Ahead only for its ability to automatically insert ChipScope cores. Plan Ahead proved to be inefficient for our purposes since many times, when a change was made in the design, the whole netlist would need to be selected again. In addition, there were bugs in the software that greatly affected the turn-around time of debugging, and it crashed several times. If Plan Ahead were used for floor planning and other design tools, then it might have proved to be much for useful. In replace of Plan Ahead, we utilized the Core Generator within ISE. The ChipScope cores provided by Xilinx include ICON, ILA, VIO, ATC2, and IBERT. The designer can choose which cores to insert by using the Core Generator in ISE. The ICON core provides communication between the different cores and the computer running ChipScope. It can connect up to fifteen ILA, VIO, and ATC2 cores. The ILA core is used to synchronously monitor internal signals. It contains logic to trigger inputs and outputs and capture data. ILA cores allow up to sixteen trigger ports, which can be 1 to 256 bits wide. The VIO core can monitor signals like ILA, but also drive internal FPGA signals real-time. The ATC2 core is similar to the ILA core, but was created for Agilent FPGA dynamic probe technology. Finally, the IBERT core contains “all the logic to control, monitor, and change transceiver parameters and perform bit error ratio tests. The only ChipScope cores we were concerned with in this project were the ICON and ILA cores We inserted one ChipScope ILA and ICON cores using the ISE Core Generator within ISE Project Navigator. The ILA core allowed us to monitor internal signals in the FPGA. Instead of inserting a VIO core, which allows inputs to and outputs from ChipScope, we used buttons to trigger the execution of write and read logic.
  • 27. 17 3.2.4 Analysis of Turn-Around Times As introduced in sections 3.3.2 and 3.3.3, implementation takes much longer than synthesis. Therefore, when it comes down to turn-around time, simulation is much more effective for iterative debugging. In Figure 3.2, the phases for simulation and hardware validation can be seen as well as the time it takes to complete each phase. For simulation, the process starts at Verilog code, becomes synthesized logic, and using a test bench, is run in iSim for viewing. This process takes about eight minute’s total. A system’s simulation run-time is much longer than if it were running on hardware, but simulation is still faster than hardware validation because it does not have to undergo implementation. The bottleneck in our simulation process is the set up time for the DDR3 memory model which accounts for most of the simulation time. Hardware validation starts at Verilog code, is synthesized, implemented, and imported into ChipScope. This whole process takes about fifteen minutes. Most of the time spent for hardware validation is on implementation of the design. In addition, hardware validation requires more of the user’s attention. It is more difficult and takes more time to set up a ChipScope core than it does to create a test bench for simulation. While a test bench (green) involves writing some simple code, a ChipScope core (orange) involves setting up all the signals to be probed. Not only is simulation faster, but the iSim tool is easier to use than ChipScope. Figure.3.3shows
  • 28. 18 Figure 3.3 iSim Screen Shot The screen shot of iSim shows the instance names in the first column, all the signals to choose from in the second, and the signals and their waveforms in the third and fourth columns. The user can view any signal without having to port it out of the design and re-implement like when using ChipScope. When adding an additional signal in iSim, only simulation needs to be restarted. The iSim interface makes debugging much easier with collapsible signal viewing, grouping abilities, and a large window for viewing many signals at once. A screen shot of ChipScope is shown in Figure 3.4 In ChipScope, you can view the devices, signals, triggers, and waveforms window.The time ChipScope is able to capture is much less than iSim. For this reason, triggers are required to execute different parts of code; this is where buttons were utilized. If a signal could not fit into the allowable number of signal inputs or was forgotten, it would need to be added to the design and implemented all over again much longer turn-around time than simulation. Therefore, simulation is used for iterative debugging and functionality testing, while hardware validation is the next step to ensure design accuracy.
  • 29. 19 Figure 3.4 ChipScope Screen Shot 3.2.5 Xilinx Core Generator One tool in ISE that was very important to our project was the CORE Generator. The core generator provided us with not only the ChipScope cores, but the memory controller, and FIFOs as well. The core generator can be accessed within ISE Project Navigator. It provides many additional functions for the designer. The options provided for creating FIFOs, for example, include common or independent clocks, first-word fall-through; a variety of flags to indicated the amount of data in the FIFO and write width, read width and depth. The different width capabilities allowed us to create asynchronous FIFOs. The memory controller was created using the Xilinx memory interface generator (MIG). There were options to use an AXI4, native, or user interface, which is discussed in a following section on interfacing with the Xilinx MIG.
  • 30. 20 CHAPTER 4 ARCHITECTURE The SDR SDRAM Controller consists of four main modules: the SDRAM controller, control interface, command, and data path modules. The SDRAM controller module is the top-level module that instantiates the three lower modules and brings the whole design together. The control interface module accepts commands and related memory addresses from the host, decoding the command and passing the request to the command module. The command module accepts commands and addresses from the control interface module, and generates the proper commands to the SDRAM. The data path module handles the data path operations during WRITEA and READA commands. The SDRAM controller module also instantiates a PLL that is used in the CLOCK_LOCK mode to improve I/O timing. This PLL is not essential to the operation of the SDR SDRAM Controller and can be easily removed. Figure 4 Architecture of SDRAM controller
  • 31. 21 4.1 CONTROL INTERFACE MODULE The control interface module decodes and registers commands from the host, and passes the decoded NOP, WRITEA, READA, REFRESH, PRECHARGE, and LOAD_MODE commands, and ADDR to the command module. The LOAD_REG1 and LOAD_REG2 commands are decoded and used internally to load the REG1 and REG2 registers with values from ADDR. Figure 4.1 shows the control interface module block diagram. Figure 4.1 Control Interface Module
  • 32. 22 The control interface module also contains a 16-bit down counter and control circuit that is used to generate periodic refresh commands to the command module. The 16-bit down counter is loaded with the value from REG2 and counts down to zero. The REFRESH_REQ output is asserted when the counter reaches zero and remains asserted until the command module acknowledges the request. The acknowledge from the command module causes the down counter to be reloaded with REG2 and the process repeats. REG2 is a 16-bit value that represents the period between REFRESH commands that the SDR SDRAM Controller issues. The value is set by the equation int (refresh_period/clock_period). For example, if an SDRAM device that is connected to the SDR SDRAM Controller has a 64-ms, 4096-cycle refresh requirement, the device must have a REFRESH command issued to it at least every64 ms/4096 = 15.625 µs. If the SDRAM and SDR SDRAM Controller are clocked by a 100-MHz clock, the maximum value of REG2 is 15.625 µs/0.01µs = 1562d. 4.2 COMMAND MODULE The command module accepts decoded commands from the control interface module, refresh requests from the refresh control logic, and generates the appropriate commands to the SDRAM. The module contains a simple arbiter that arbitrates between the commands from the host interface and the refresh requests from the refresh control logic. The refresh requests from the refresh control logic have priority over the commands from the host interface. If a command from the host arrives at the same time or during a hidden refresh operation, the arbiter holds off the host by not asserting CMDACKuntil the hidden refresh operation is complete. If a hidden refresh command is received while a host operation is in progress, the hidden refresh is held off until the host operation is complete. Figure 4.2 shows the command module block diagram.
  • 33. 23 Figure 4.2 Command Module Block Diagram After the arbiter has accepted a command from the host, the command is passed onto the command generator portion of the command module. The command module uses three shift registers to generate the appropriate timing between the commands that are issued to the SDRAM. One shift register is used to control the timing the ACTIVATE command; a second is used to control the positioning of the READA or WRITEA commands; a third is used to time command durations, which allows the arbiter to determine if the last requested operation has been completed. The command module also performs the multiplexing of the address to the SDRAM. The row portion of the address is multiplexed out to the SDRAM outputs A[11:0] during the ACTIVATE(RAS) command. The column portion is then multiplexed out to the SDRAM address outputs during a READA (CAS) or WRITEA command. The output signal OEis generated by the command module to control tristate buffers in the last stage of the DATAIN path in the data path module.
  • 34. 24 4.3 DATA PATH MODULE The data path module provides the SDRAM data interface to the host. Host data is accepted on DATAINfor WRITEA commands and data is provided to the host on DATAOUTduring READA commands. Figure 4.3 shows the data path module block diagram. Figure 4.3 Data Path Module The DATAINpath consists of a 2-stage pipeline to align data properly relative to the CMDACK and the commands that are issued to the SDRAM. DATAOUTconsists of a 2-stage pipeline that registers data from the SDRAM during a READA command. DATAOUTpipeline delay can be reduced to one or even zero registers, with the only affect that the relationship of DATAOUTto CMDACKchanges.
  • 35. 25 CHAPTER 5 OPERATION The single data rate (SDR) synchronous dynamic random access memory (SDRAM) controller provides a simplified interface to industry standard SDR SDRAM. The SDR SDRAM Controller is available in either Verilog HDL or VHDL and is optimized for the architecture. The SDR SDRAM Controller supports the following features:  Burst lengths of 1, 2, 4, or 8 data words.  CAS latency of 2 or 3 clock cycles.  16-bit programmable refresh counter used for automatic refresh.  2-chip selects for SDRAM devices.  Supports the NOP, READA, WRITEA, AUTO_REFRESH, PRECHARGE, ACTIVATE, BURST_STOP, and LOAD_MR commands.  Support for full-page mode operation.  Data mask line for write operations.  PLL to increase system performance. Figure 5 SDR SDRAM Controller System-Level Diagram
  • 36. 26 5.1 SDRAM OVERVIEW SDRAM is high-speed dynamic random access memory (DRAM) with a synchronous interface. The synchronous interface and fully-pipelined internal architecture of SDRAM allows extremely fast data rates if used efficiently. Internally, SDRAM devices are organized in banks of memory, which are addressed by row and column. The number of row- and column-address bits and the number of banks depends on the size of the memory. SDRAM is controlled by bus commands that are formed using combinations of the RASN, CASN, and WENsignals. For instance, on a clock cycle where all three signals are high, the associated command is a no operation (NOP). A NOP is also indicated when the chip select is not asserted. Table 5.1 shows the standard SDRAM bus commands. Table 5.1 SDRAM Bus Commands SDRAM banks must be opened before a range of addresses can be written to or read from. The row and bank to be opened are registered coincident with the ACT command. When a bank is accessed for a read or a write it may be necessary to close the bank and re-open it if the row to be accessed is different than the row that is currently opened. Closing a bank is done with the PCH command.
  • 37. 27 The primary commands used to access SDRAM are RD and WR. When the WR command is issued, the initial column address and data word is registered. When a RD command is issued, the initial address is registered. The initial data appears on the data bus 1 to 3 clock cycles later. This is known as CAS latency and is due to the time required to physically read the internal DRAM core and register the data on the bus. The CAS latency depends on the speed of the SDRAM and the frequency of the memory clock. In general, the faster the clock, the more cycles of CAS latency are required. After the initial RD or WR command, sequential read and writes continue until the burst length is reached or a BT command is issued. SDRAM memory devices support burst lengths of 1, 2, 4, or 8 data cycles. The ARF is issued periodically to ensure data retention. This function is performed by the SDR SDRAM Controller and is transparent to the user. The LMR is used to configure the SDRAM mode register which stores the CAS latency, burst length, burst type, and write burst mode. Consult the SDRAM specification for additional details. SDRAM comes in dual in-line memory modules (DIMMs), small-outline DIMMs (SO-DIMMs) and chips. To reduce pin count SDRAM row and column addresses are multiplexed on the same pins. SDRAM often includes more than one bank of memory internally and DIMMS may require multiple chip selects. 5.2 FUNCTIONAL DESCRIPTION Table shows the SDR SDRAM Controller interface signals. All signals are synchronous to the system clock and outputs are registered at the SDR SDRAM Controller’s outputs.
  • 38. 28 Table 5.2 Interface Signals 5.3 SDRAM CONTROLLER COMMAND INTERFACE The SDR SDRAM Controller provides a synchronous command interface to the SDRAM and several control registers. Table shows the commands, which are described in following sections. The following rules apply to the commands with reference with table 5.2:  All commands, except NOP, are driven by the user ontoCMD [2:0]; ADDR and DATAIN are set appropriately for the requested command. The controller registers the command on the next rising clock edge.
  • 39. 29  To acknowledge the command the controller asserts CMDACKfor one clock period.  For READA or WRITEA commands, the user should start receiving or writing data on DATAOUTand DATAIN.  The user must drive NOP onto CMD [2:0] by the next rising clock edge after CMDACKis asserted. Table 5.3 Interface Commands 5.3.1 NOP Command NOP is a no operation command to the controller. When NOP is detected by the controller, it performs a NOP in the following clock cycle. A NOP must be issued the following clock cycle after the controller has acknowledged a command. The NOP command has no affect on SDRAM accesses that are already in progress.
  • 40. 30 5.3.2 READA Command Figure 5.1 Timing diagram for a READA command The READA command instructs the SDR SDRAM Controller to perform a burst read with auto- precharge to the SDRAM at the memory address specified by ADDR. The SDR SDRAM Controller issues an ACTIVATE command to the SDRAM followed by a READA command. The read burst data first appears on DATAOUT(RCD + CL + 2) after the SDR SDRAM Controller asserts CMDACK. During a READA command the user must keep DMlow. When the controller is configured for full-page mode, the READA command becomes READ (READ without auto-pre- charge). Figure 5.1 shows an example timing diagram for a READA command.
  • 41. 31 The following sequence describes the general operation of the READA command:  The user asserts READA, ADDRand DM.  The SDR SDRAM Controller asserts CMDACK to acknowledge the command and simultaneously starts issuing commands to the SDRAM devices.  One clock after CMDACKis asserted, the user must assert NOP.  The CMDACKpresents the first read burst value on DATAOUT, the remainder of the read bursts follow every clock cycle. 5.3.3 WRITEA Command Figure 5.2 Timing diagram for a WRITEA command The WRITEA command instructs the SDR SDRAM Controller to perform a burst write with auto- precharge to the SDRAM at the memory address specified by ADDR.
  • 42. 32 The SDR SDRAM Controller will issue an ACTIVATE command to the SDRAM followed by a WRITEA command. The first data value in the burst sequence must be presented with the WRITEA and ADDR address. The host must start clocking data along with the desired DMvalues into the SDR SDRAM Controller (tRCD – 2) clocks after the SDR SDRAM Controller has acknowledged the WRITEAcommand. See a SDRAM data sheet for how to use the data mask lines DM/DQM.When the SDR SDRAM Controller is in the full-page mode WRITEA becomes WRITE (write without auto-precharge). Figure shows an example timing diagram for a WRITEA command. The following sequence describes the general operation of a WRITEA command:  The user asserts WRITEA, ADDR, the first write data value on DATAIN, and the desired data mask value on DM with reference to the table 5.2 and 5.3.  The SDR SDRAM Controller asserts CMDACK to acknowledge the command and simultaneously starts issuing commands to the SDRAM devices.  One clock after CMDACKwas asserted, the user asserts NOP on CMD.  The user clocks data and data mask values into the SDR SDRAM Controller through DATAIN and DM. 5.3.4 REFRESH Command The REFRESH command instructs the SDR SDRAM Controller to perform an ARF command to the SDRAM. The SDR SDRAM Controller acknowledges the REFRESH command with CMDACK. Figure 5.3 shows an example timing diagram of the REFRESH command.
  • 43. 33 Figure 5.3 Timing diagram for a REFRESH command The following sequence describes the general operation of a REFRESH command:  The user asserts REFRESH on the CMDinput.  The SDR SDRAM Controller asserts CMDACK to acknowledge the command and simultaneously starts issuing commands to the SDRAM devices.  The user asserts NOP on CMD
  • 44. 34 5.3.5 PRECHARGE Command Figure 5.4 Timing diagram for a PRECHARGE command The PRECHARGE command instructs the SDR SDRAM Controller to perform a PCH command to the SDRAM. The SDR SDRAM Controller acknowledges the command with CMDACK. The PCH command is also used to generate a burst stop to the SDRAM. Using PRECHARGE to terminate a burst is only supported in the full-page mode. Note that the SDR SDRAM Controller adds a latency from when the host issues a command to when the SDRAM sees the PRECHARGE command of 4 clocks. If a full-page read burst is to be stopped after 100 cycles, the PRECHARGE command must be asserted (4 + CL – 1) clocks before the desired end of the burst (CL – 1 requirement is imposed by the SDRAM devices). So if the CAS latency is 3, the PRECHARGE command must be issued (100 – 3 –1 – 4) = 92 clocks into the burst.
  • 45. 35 Figure 5.4 shows an example timing diagram of the PRECHARGE command. The following sequence describes the general operation of a PRECHARGE command:  The user asserts PRECHARGE on CMD.  The DR SDRAM Controller asserts CMDACK to acknowledge the command and simultaneously starts issuing commands to the SDRAM devices.  The user asserts NOP on CMD 5.3.6 LOAD_MODE Command The LOAD_MODE command instructs the SDR SDRAM Controller to perform a LMR command to the SDRAM. The value that is to be written into the SDRAM mode register must be present on ADDR [11:0]with the LOAD_MODE command. The value on ADDR [11:0]is mapped directly to the SDRAM pins A11-A0 when the SDR SDRAM Controller issues the LMR to the SDRAM. Figure 5.5 shows an example timing diagram.  The following sequence describes the general operation of a LOAD_MODE command, the users asserts LOAD_MODE on CMD.  The SDR SDRAM Controller asserts CMDACK to acknowledge the command and simultaneously starts issuing commands to the SDRAM devices.  One clock after the SDR SDRAM Controller asserts CMDACK, the users asserts NOP on CMD.
  • 46. 36 . Figure 5.5 Timing diagram for a LOAD_MODE Command 5.3.7 LOAD_REG1 Command Table 5.4 REG1 Bit Definitions
  • 47. 37 CL is the CAS latency of the SDRAM memory in clock periods and is dependent on the memory device speed grade and clock frequency. Consult the SDRAM data sheet for appropriate settings. CL must be set to the same value as CL for the SDRAM memory devices. RCD is the RAS to CAS delay in clock periods and is dependent on the SDRAM speed grade and clock frequency. RCD = INT(tRCD/clock period), where tRCD is the value from the SDRAM data sheet and clock period is the clock period of the SDR SDRAM Controller and SDRAM clock.RRD is the refresh to RAS delay in clock periods. RRD is dependent on the SDRAM speed grade and clock frequency. RRD= INT(tRRD/clock_period), where tRRD is the value from the SDRAM data sheet and clock_period is the clock period of the SDR SDRAM controller and SDRAM clock.PM is the page mode bit. If PM = 0, the SDR SDRAM Controller operates in non- page mode. If PM = 1, the SDR SDRAM Controller operates in page-mode. See Section “Full- Page Mode Operation” for more information. BL is the burst length the SDRAM devices have been configured for. 5.3.8 LOAD_REG2 Command The LOAD_REG2 command instructs the SDR SDRAM Controller to load the internal configuration register REG2. REG2 is a 16-bit value that represents the period between REFRESH commands that the SDR SDRAM Controller issues. The value is set by the equation int (refresh_period/clock period). For example, if a SDRAM device connected to the SDR SDRAM Controller has a 64-ms, 4096- cycle refresh requirement the device must have a REFRESH command issued to it at least every 64 ms/4096 = 15.625 09 µs.If the SDRAM and SDR SDRAM Controller are clocked by a 100 MHz clock, the maximum value of REG2 is 15.625 µs/0.01 µs = 1562d. The value that is to be written into REG2 must be presented on the ADDR input simultaneously with the assertion of the command LOAD_REG2.
  • 48. 38 CHAPTER 6 ELEMENTS OF MEMORY BANK 6.1 DECODER A decoder is a device which does the reverse operation of an encoder, undoing the encoding so that the original information can be retrieved. The same method used to encode is usually just reversed in order to decode. It is a combinational circuit that converts binary information from n input lines to a maximum of 2n unique output lines. 6.1.1 A 2-to-4 line single-bit decoder In digital electronics, a decoder can take the form of a multiple-input, multiple-output logic circuit that converts coded inputs into coded outputs, where the input and output codes are different. E.g. n-to-2n , binary-coded decimal decoders. Enable inputs must be on for the decoder to function, otherwise its outputs assume a single "disabled" output code word. Decoding is necessary in applications such as data multiplexing, 7 segment display and memory address decoding. The example decoder circuit would be an AND gate because the output of an AND gate is "High" (1) only when all its inputs are "High." Such output is called as "active High output". If instead of AND gate, the NAND gate is connected the output will be "Low" (0) only when all its inputs are "High". Such output is called as "active low output". A slightly more complex decoder would be the n-to-2n type binary decoders. These type of decoders are combinational circuits that convert binary information from 'n' coded inputs to a maximum of 2n unique outputs. We say a maximum of 2n outputs because in case the 'n' bit coded information has unused bit combinations, the decoder may have less than 2n outputs.
  • 49. 39 We can have 2-to-4 decoder, 3-to-8 decoder or 4-to-16 decoder. We can form a 3-to-8 decoder from two 2-to-4 decoders (with enable signals). Figure 6.1 RTL of decoder Similarly, we can also form a 4-to-16 decoder by combining two 3-to-8 decoders. In this type of circuit design, the enable inputs of both 3-to-8 decoders originate from a 4th input, which acts as a selector between the two 3-to-8 decoders. This allows the 4th input to enable either the top or bottom decoder, which produces outputs of D(0) through D(7) for the first decoder, and D(8) through D(15) for the second decoder. Figure 6.2 Simulation Of Decoder
  • 50. 40 A decoder that contains enable inputs is also known as a decoder-demultiplexer. Thus, we have a 4-to-16 decoder produced by adding a 4th input shared among both decoders, producing 16 outputs. 6.2 DEMUX The data distributor, known more commonly as a demultiplexer or “Demux” for short, is the exact opposite of the Multiplexer we saw in the previous tutorial. The demultiplexer converts a serial data signal at the input to a parallel data. Figure 6.3 RTL Of DEMUX
  • 51. 41 The demultiplexer takes one single input data line and then switches it to any one of a number of individual at its output lines output lines one at a time. Figure 6.4 Simulation Of DEMUX 6.3 RAM Random-access memory (RAM) is a form of computer data storage. A random-access memory device allows data items to be read and written in roughly the same amount of time regardless of the order in which data items are accessed. In contrast, with other direct-access data storage media such as hard disks, CD-RWs, DVD-RWs and the older drum memory, the time required to read and write data items varies significantly depending on their physical locations on the recording medium, due to mechanical limitations such as media rotation speeds and arm movement delays. Today, random-access memory takes the form of integrated circuits. Strictly speaking, modern types of DRAM are not random access, as data is read in bursts, although the name DRAM / RAM has stuck. However, many types of SRAM are still random access even in a strict sense.
  • 52. 42 RAM is normally associated with volatile types of memory (such as DRAM memory modules), where stored information is lost if the power is removed, although many efforts have been made to develop non-volatile RAM chips. Other types of non-volatile memory exist that allow random access for read operations, but either do not allow write operations or have limitations on them. These include most types of ROM and a type of flash memory called NOR-Flash. 6.3.1 TYPES OF RAM The two main forms of modern RAM are Static Ram (SRAM), dynamic RAM (DRAM). In SRAM, a bit of data is stored using the state of a flip-flop. This form of RAM is more expensive to produce, but is generally faster and requires less power than DRAM and, in modern computers, is often used as cache memory for the CPU. DRAM stores a bit of data using a transistor and capacitor pair, which together comprise a memory cell. The capacitor holds a high or low charge (1 or 0, respectively), and the transistor acts as a switch that lets the control circuitry on the chip read the capacitor's state of charge or change it. As this form of memory is less expensive to produce than static RAM, it is the predominant form of computer memory used in modern computers. Figure 6.5 RTL of RAM
  • 53. 43 Both static and dynamic RAM are considered volatile, as their state is lost or reset when power is removed from the system. By contrast, read-only memory (ROM) stores data by permanently enabling or disabling selected transistors, such that the memory cannot be altered. Writeable variants of ROM (such as EEPROM and flash memory) share properties of both ROM and RAM, enabling data to persist without power and to be updated without requiring special equipment. These persistent forms of semiconductor ROM include USB flash drives, memory cards for cameras and portable devices, etc. ECC memory (which can be either SRAM or DRAM) includes special circuitry to detect and/or correct random faults (memory errors) in the stored data, using parity bits or error correction code. In general, the term RAM refers solely to solid-state memory devices (either DRAM or SRAM), and more specifically the main memory in most computers. In optical storage, the term DVD- RAM is somewhat of a misnomer since, unlike CD-RW or DVD-RW it does not need to be erased before reuse. Nevertheless a DVD-RAM behaves much like a hard disc drive if somewhat slower. Figure 6.6 Simulation of RAM
  • 54. 44 6.4 MUX In electronics, a multiplexer is a device that selects one of several analog or digital input signals and forwards the selected input into a single line. A multiplexer of 2n inputs has n select lines, which are used to select which input line to send to the output. Multiplexers are mainly used to increase the amount of data that can be sent over the network within a certain amount of time and bandwidth. A multiplexer is also called a data selector. Figure 6.7 RTL of MUX An electronic multiplexer can be considered as a multiple-input, single-output switch, and a demultiplexer as a single-input, multiple-output switch. The schematic symbol for a multiplexer is an isosceles trapezoid with the longer parallel side containing the input pins and the short parallel side containing the output pin.
  • 55. 45 The schematic on the right shows a 2-to-1 multiplexer on the left and an equivalent switch on the right. The wire connects the desired input to the output. An electronic multiplexer makes it possible for several signals to share one device or resource, for example one A/D converter or one communication line, instead of having one device per input signal. Figure 6.8 Simulation Of MUX 6.5 BUFFER A buffer amplifier (sometimes simply called a buffer) is one that provides electrical impedance transformation from one circuit to another. Two main types of buffer exist: the voltage buffer and the current buffer
  • 56. 46 6.5.1 VOLTAGE BUFFER A voltage buffer amplifier is used to transfer a voltage from a first circuit, having a high output impedance level, to a second circuit with a low input impedance level. The interposed buffer amplifier prevents the second circuit from loading the first circuit unacceptably and interfering with its desired operation. In the ideal voltage buffer in the diagram, the input resistance is infinite, the output resistance zero (impedance of an ideal voltage source is zero). Other properties of the ideal buffer are: perfect linearity, regardless of signal amplitudes; and instant output response, regardless of the speed of the input signal. If the voltage is transferred unchanged (the voltage gain Av is 1), the amplifier is a unity gain buffer; also known as a voltage follower because the output voltage follows or tracks the input voltage. Although the voltage gain of a voltage buffer amplifier may be (approximately) unity, it usually provides considerable current gain and thus power gain. However, it is commonplace to say that it has a gain of 1 (or the equivalent 0 dB), referring to the voltage gain. As an example, consider a Thévenin source (voltage VA, series resistance RA) driving a resistor load RL. Because of voltage division (also referred to as "loading") the voltage across the load is only VA RL / ( RL + RA ). However, if the Thévenin source drives a unity gain buffer such as that in Figure 1 (top, with unity gain), the voltage input to the amplifier is VA, and with no voltage division because the amplifier input resistance is infinite. At the output the dependent voltage source delivers voltage Av VA = VA to the load, again without voltage division because the output resistance of the buffer is zero. A Thévenin equivalent circuit of the combined original Thévenin source and the buffer is an ideal voltage source VA with zero Thévenin resistance. Figure 6.9 RTL Of Buffer
  • 57. 47 6.5.2 CURRENT BUFFER Typically a current buffer amplifier is used to transfer a current from a first circuit, having a low output impedance level, to a second circuit with a high input impedance level. The interposed buffer amplifier prevents the second circuit from loading the first circuit unacceptably and interfering with its desired operation. In the ideal current buffer in the diagram, the input impedance is zero and the output impedance is infinite (impedance of an ideal current source is infinite). Again, other properties of the ideal buffer are: perfect linearity, regardless of signal amplitudes; and instant output response, regardless of the speed of the input signal. For a current buffer, if the current is transferred unchanged (the current gain βi is 1), the amplifier is again a unity gain buffer; this time known as a current follower because the output current follows or tracks the input current. Figure 6.10- Simulation Of Buffer
  • 58. 48 As an example, consider a Norton source (current IA, parallel resistance RA) driving a resistor load RL. Because of current division (also referred to as "loading") the current delivered to the load is only IA RA / ( RL + RA ). However, if the Norton source drives a unity gain buffer (bottom, with unity gain), the current input to the amplifier is IA, with no current division because the amplifier input resistance is zero. At the output the dependent current source delivers current βi IA = IA to the load, again without current division because the output resistance of the buffer is infinite. A Norton equivalent circuit of the combined original Norton source and the buffer is an ideal current source IA with infinite Norton resistance. 6.6 MEMORY BANK A memory bank is a logical unit of storage in electronics, which is hardware dependent. In a computer the memory bank may be determined by the memory access controller along with physical organization of the hardware memory slots. In a typical synchronous dynamic random-access memory (SDRAM) or double data rate synchronous dynamic random-access memory (DDR SDRAM), a bank consists of multiple rows and columns of storage units and is usually spread out across several chips. In a single read or write operation, only one bank is accessed, therefore bits in a column or a row, per bank, per chip = memory bus width in bits (single channel). The size of a bank is further determined by bits in a column and a row, per chip× number of chips in a bank.
  • 59. 49 Figure 6.11 RTL Of Memory Bank Some computers have several identical memory banks of RAM, and use bank switching to switch between them. Harvard architecture computers have (at least) 2 very different banks of memory, one for program storage and one for data storage.
  • 60. 50 Figure 6.12 Simulation Of Memory Bank
  • 61. 51 CHAPTER 7 RESULTS AND CONCLUSIONS 7.1 POWER CONSUMED WHEN ALL 8 BANKS ARE ON 7.1.1 Project Table 7.1 Project 7.1.2 Device Table 7.2 Device
  • 62. 52 7.1.3 Environment Table 7.3 Environment 7.1.4 Default Activity Table 7.4 Default Activity
  • 63. 53 7.1.5 On-Chip Power Summary Table 7.5 On-Chip Power Summary 7.1.6 Thermal Summary Table 7.6 Thermal Summary 7.1.7 Power Supply Summary Table 7.7 Power Supply Summary
  • 64. 54 Table 7.8 Power Supply Current 7.1.8 Confidence Level Table 7.9 Confidence Level
  • 65. 55 7.1.9 By Hierarchy Table 7.10 By Hierarchy
  • 66. 56 7.2 POWER CONSUMED WHEN ONLY ONE MEMORY BANK IS IN USE 7.2.1. Project Table 7.11 Project 7.2.2 Device Table 7.12 Device
  • 67. 57 7.2.3 Environment Table 7.13 Environment 7.2.4 Default Activity Rates Table 7.14 Default Activity
  • 68. 58 7.2.5 On-Chip Power Summary Table 7.15 On-Chip Power Summary 7.2.6 Thermal Summary Table 7.16 Thermal Summary 7.2.7 Power Supply Summary Table 7.17 Power Supply Summary
  • 69. 59 Table 7.18 Power Supply Current 7.2.8 Confidence Level Table 7.19 Confidence Level
  • 70. 60 7.2.9 By Hierarchy Table 7.20 By Hierarchy 7.3 CONCLUSION This project addresses the problem of finding a memory map for firm real-time workloads in the context of SDRAM memory controllers. Existing controllers use either a static memory map or provide only limited configurability. We use the number of banks requests are interleaved over as flexible configuration parameter, while previous work considers it a fixed part of the controller architecture. We use this degree of freedom to optimize the memory configuration to the mix of applications and their requirements. This is beneficial for the worst-case performance in terms of bandwidth, latency and power.
  • 71. 61 CHAPTER 8 FUTURE SCOPE The advantages of this controller compared to SDR SDRAM, DDR1 SDRAM and DDR2 SDRAM is that it synchronizes the data transfer, and the data transfer is twice as fast as previous, the production cost is also very low. We have successfully designed using Verilog HDL and synthesized using Xilinx tool. 1. DDR4 SDRAM is the 4th generation of DDR SDRAM. 2. DDR3 SDRAM improves on DDR SDRAM by using differential signalling and lower voltages to support significant performance advantages over DDR SDRAM. 3. DDR3 SDRAM standards are still being developed and improved.
  • 72. 62 REFERENCES [1] C. van Berkel, “Multi-core for Mobile Phones,” in Proc. DATE, 2009. [2] “International Technology Roadmap for Semiconductors (ITRS),” 2009. [3] P. Kollig et al., “Heterogeneous Multi-Core Platform for Consumer Multimedia Applications,” in Proc. DATE, 2009. [4] L. Steffens et al., “Real-Time Analysis for Memory Access in Media Processing SoCs : A Practical Approach,” Proc. ECRTS, 2008. [5] S. Bayliss et al., “Methodology for designing statically scheduled application-specific SDRAM controllers using constrained local search, “in Proc. FPT, 2009. [6] B. Akesson et al., “Architectures and modelling of predictable memory controllers for improved system integration,” in Proc. DATE, 2011. [7] J. Reineke et al., “PRET DRAM Controller: Bank Privatization for Predictability and Temporal Isolation,” in Proc. CODES+ISSS, 2011. [8] M. Paolieri et al., “An Analyzable Memory Controller for Hard Real-Time CMPs,” Embedded Systems Letters, IEEE, vol. 1, no. 4, 2009. [9] Micron Technology Inc., “DDR3-800-1Gb SDRAM Datasheet, 02/10 EN edition,” 2006. [10] D. Stiliadis et al., “Latency-rate servers: a general model for analysis of traffic scheduling algorithms,” IEEE/ACM Trans. Netw., 1998. [11] B. Akesson et al., “Classification and Analysis of Predictable Memory Patterns,” in Proc.RTCSA, 2010. [12] DDR2 SDRAM Specification, JESD79-2E ed., JEDEC Solid State Technology Association, 2008. [13] DDR3 SDRAM Specification, JESD79-3D ed., JEDEC Solid State Technology Association, 2009.
  • 73. 63 [14] K. Chandrasekar et al., “Improved Power Modelling of DDR SDRAMs,” in Proc. DSD, 2011. [15] B. Akesson et al., “Automatic Generation of Efficient Predictable Memory Patterns,” in Proc. RTCSA, 2011.