SlideShare a Scribd company logo
1 of 4
D.Venkata Kishore, C.Ram Kumar / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 3, Issue 3, May-Jun 2013, pp.1152-1155
1152 | P a g e
Design and Implementation of Pipelined FFT Processor
*D.Venkata Kishore, **C.Ram Kumar
*Lecturer, ECE Department, JNTUA College of Engineering Pulivendula
**Lecturer, ECE Department, JNTUA College of Engineering Pulivendula
Abstract
It is important to develop a high-
performance FFT processor to meet the
requirements of real time and low cost in many
different systems. So a radix-2 pipelined FFT
processor based on Field Programmable Gate
Array (FPGA) for Wireless Local Area Networks
(WLAN) is proposed. Unlike being stored in the
traditional ROM, the twiddle factors in our
pipelined FFT processor can be accessed directly.
A novel simple address mapping scheme is also
proposed. The FFT processor has two pipelines,
one is in the execution of complex multiplication
of the butterfly unit, and the other is between the
RAM modules, which read input data, store
temporary variables of butterfly unit and output
the final results. Finally, the pipelined 64-point
FFT processor can be completely implemented
within only 67 clock cycles.
Keywords-FFT; FPGA; address mapping
I. INTRODUCTION
Fast Fourier Transform (FFT) processor is
widely used in different applications, such as
WLAN, image process, spectrum measurements,
radar and multimedia communication services [1].
However, the FFT algorithm is a demanding task
and it must be precisely designed to get an efficient
implementation. If the FFT processor is made
flexible and fast enough, a portable device equipped
with wireless transmission system is feasible.
Therefore, an efficient FFT processor is required for
real-time operations [2] and designing a fast FFT
processor is a matter of great significance.
In the past twenty years, FPGA has
developed rapidly and gradually become universal.
Compared with design flow of traditional ASIC,
designs based on FPGA have the advantages of
flexibility and high performance price ratio. Many
researchers have studied on pipelined FFT based on
FPGA [3], [4], [5]. For instance, in [3], they
proposed an approach to design an FFT processor
for wireless applications, but his design has too
many clock cycles and isn’t fast enough. In
comparison to their designs, we propose a simple
and feasible pipelined implementation of a 32-bit
64-point FFT processor based on FPGA for WLAN.
This paper is organized as follows. In the
next section, we introduce a basic radix-2 FFT
algorithm to briefly discuss which decimation is
better to the system, a three-multiplication method,
and a novel address mapping scheme which reduces
delay and increases the speed of the system. In
section III, the pipelined FFT architecture is
proposed and each unit is also illustrated. Section IV
is the implementation of the 64-point FFT processor
based on FPGA, and hardware resources are
explicitly listed out. The last section gives the
conclusion.
II. FFT ALGORITHM AND ADDRESS
MAPPING SCHEME
A. Radix-2 FFT Algorithm
The FFT algorithm can compute the
Discrete Fourier Transform (DFT) effectively.
Given a sequence {x(n)} of N complex numbers, we
can compute its DFT, another sequence {X(k)} of N
complex numbers, according to the following
formula [6]
N-1
N
X(k) = ∑ x(n)W nk
, k= 0,1,……,N-1. (1)
n = 0
And according to the different way to
decimate, it can be divided into two types, DIF
(Decimation in Frequency) and DIT (Decimation in
Time). The DIF algorithm is easier to design than
DIT. And considering the finite word length effect,
DIF has much more advantages than DIT, such as
reducing the additive noise, which is introduced by
the multiplication when it is implemented with the
fixed point [7] and reducing the complexity of the
whole system. Consequently, we use the DIF
algorithm to design radix-2 FFT module and most of
current FFT processors are also based on this
algorithm [8].
B. Three-multiplication Method
It’s undeniable that complex multiplication
is the dominant factor affecting the speed and the
throughput of FFT processor. Computing a complex
multiplication requires four real multipliers and two
real adders. As we all know, the hardware area of a
real multiplier is larger than that of a real adder in
FPGA. So we should do our best to convert the
complex multiplication into addition and subtraction
to optimize the whole performance as high as
possible. Having taken into account all operands are
32-bit complex numbers, the difference of two
D.Venkata Kishore, C.Ram Kumar / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 3, Issue 3, May-Jun 2013, pp.1152-1155
1153 | P a g e
inputs X m (i) and X m ( j) can be
expressed by z1 = x1 + jy1 and the twiddle
factor WN
ek
= exp(2n π)can be expressed by
N
z 2 = x2 + jy 2 . So the product of them can be also
expressed by z = x + jy . That’s to say, the 16-bit real
part x of the product is equivalent to x1 x2 − y1 y2
and the 16-bit imaginary part y is equivalent to x1 y
2 + x2 y1 .
Therefore, we can transform the product z
easily as the following equations
x = x1 ( x 2 + y 2 ) − y 2 ( x1 + y1 ), (2)
y = x1 ( x 2 + y 2 ) − x 2 ( x1 − y1 ). (3)
Obviously, using this factorization scheme,
the system has some advantages [9]. The number of
real multiplications is reduced from four to three.
And addition has less consumption than
multiplication. So the system power consumption is
also reduced. In this study, we can save sixteen
embedded multiplier 9-bit elements in FPGA. As for
this 64-point FFT processor, the numerical values of
x2 + y2 , x1 − y1 , x1 + y1 , x2 and y2 can be gotten
before they participate in the real multiplication.
C. The Novel Address Mapping Scheme
In this paper, the block size of our system
is 64 points. Having considered the properties of
radix-2 64-point FFT, it needs to read 8 operands
from memories at a time so as to achieve a high-
speed FFT. As we all know, parallel accessing data
is crucial to a system [10]. Thus, these 8 operands in
our design are located in different row or column of
memory blocks and this arrangement ensures that 8
conflict-free memory accesses can be performed in
parallel. Initially, we use 8 32-bit dual-port
memories to store 64 operands in sequence. And
then a new linear shift conflict-free address mapping
scheme is adopted to change the addresses of
operands. The primary two-dimensional addresses
of operands will be mapped to new ones.
For instance, we assume that the original
two-dimensional coordinate is (a, b), in which a and
b represent the address of the data in one memory
and the number of the 8 memories, respectively.
Then, we obtain a new conflict-free address (A, B)
by means of the following equations
A=b, (4)
B=(a+b)%8. (5)
In our design, it can be ensured that no
memory location is read from or written to at the
same time, and this new mapping scheme is
feasible, effective and simple.
III. THE PIPELINED FFT PROCESSOR
ARCHITECTURE
For high throughput systems, pipelined
architecture is a good choice, and it is also an ideal
method to implement high-speed long-size FFT
owing to its regular structure and simple control.
The performance of pipelined FFT processor can be
improved by optimizing the structure and saving
hardware resources. The block diagram of our
proposed FFT processor is illustrated in Fig.1. It
consists of four essential units. Control unit, the
kernel of the FFT processor, harmonizes the whole
system. Butterfly unit (BU), which has three-stage
pipelined structure, carries out the complex
multiplication. Two dual-port RAMs are used to
store and output data. And AGU, the abbreviation
Fig 1. The pipelined FFT architecture.
of address generator unit, produces 8 3-bit read
addresses and write addresses.
A. Control Unit
Control unit, which generates all control
signals for the whole system, is responsible for
operation control of the processor. A 48-bit signal
w_con controls the whole FFT processor. And this
signal w_con generates two parameters, write_en
and read_en, to control AGU. It also generates sel1
and sel2 signals to select data from two RAMs, each
of which is made up of 8 32-bit registers. The BU
and the remaining parts are controlled by w_con as
well. This control unit harmonizes all steps of the
FFT processor based on a 7-bit counter.
B. The DIF Butterfly Unit
For FFT algorithm, the central component
is the BU that calculates the sum and difference of
two input data, and plays an extremely important
role in computing the product of the difference and
twiddle factors.
D.Venkata Kishore, C.Ram Kumar / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 3, Issue 3, May-Jun 2013, pp.1152-1155
1154 | P a g e
We only use 11 factors W0
64 , W1
64 , W2
64 , W3
64 ,
W4
64, W5
64 , W6
64 , W7
64 , W8
64 , W16
64 , W24
64 to
express all 32 factors that we need in this FFT
processor owing to the nk fact that the twiddle factor
W N can be separated into two components. For
instance, W24
64 can be derived from the product of
W4
64 and W24
64 . Moreover, the value of W0
64 is a
constant 1, and for W16
64 , only a negative sign is
needed to add to the real part of the relevant data,
and then to inverse the real part and imaginary part.
So we can eliminate these two factors, that’s to say,
actually we just need 9 twiddle factors. In addition,
we use 16-bit fixed point decimal to express these 9
twiddle factors. Although the fixed point decimal
arithmetic isn’t precise enough, it can satisfy the
requirements of general systems.
Fig 2. Block diagram of BU.
Due to the fact that the multiplication of
twiddle factors and corresponding data is very
important, three-stage pipeline structure is used
for the complex multiplication to obtain a high
speed computation. The architecture of BU is shown
in Fig. 2.
In the BU, both of the complex inputs are
32 bits, including 16-bit real part and 16-bit
imaginary part. The sum of them needs to be scaled
down by a factor of 2 to avoid arithmetic overflow,
and the same operation is applied to the difference
of them. On the other hand, the factor W0
64 and the
first parameter con_s of the second multiplexer are
not involved in the complex multiplier, and they
can be used as a constant 1, just as Fig.2 has
depicted. Thus, the power consumption of the
complex multiplier can be reduced and the hardware
resources will be saved. There’re some points to be
emphasized. The difference and twiddle factors are
both 32 bits, so the result of the first complex
multiplier will be 64 bits. But because we adopt the
fixed point decimal computation, we should
intercept it to a 32-bit parameter f_mult as the input
of the second complex multiplier.
The most remarkable advantage in this unit
is that we use 3 32-bit registers to realize the three-
stage pipeline of butterfly transform, using register1
to store the difference, register2 to store the
intercepted result of second stage f_mult and
register3 to store the final result cmul_b.
C. The RAM Unit
The RAM1 and RAM2 are made up of 8
32-bit registers respectively. And data is always
written to the outside memories from RAM2, and it
is always read to RAM1 from the outside memories.
Then let introduce the key algorithm used in this
unit. Considering the properties of 64-point FFT, we
can use the radix-2 DIF 8-point FFT as a whole unit,
so there
Fig 3 The example of radix-2 8-point FFT.
are only two stages to accomplish the 64-point FFT.
And these two stages are identical to the six stages
of the standard radix-2 DIF 64-point FFT. System
parallel reads 8 32-bit operands from outer
memories to RAM1 at a time, and we need only
read sixteen times. An example of this algorithm is
shown in Fig.3.
The pipeline in RAM units is briefly
discussed as follows. Firstly, system reads 8
operands from outer memories and writes them to
RAM1, and then the results will be stored in RAM2
after they are computed. Meanwhile, system reads
the next 8 operands. Subsequently, system will
access operands from RAM1 or RAM2 to compute
these 16 operands. Furthermore, within a single
clock four butterfly computations will be executed
simultaneously. The final results of these 16
operands will be all stored in RAM2, and then they
are written to outer memories. At the same time,
another 8 operands will be read. Accordingly, only
76 clock cycles are needed to complete this radix-2
DIF 64-point FFT.
D. AGU
Compared with other units, AGU is also
quite important. It will create 8 read and 8 write
addresses, which determine the data access to outer
memories.
D.Venkata Kishore, C.Ram Kumar / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 3, Issue 3, May-Jun 2013, pp.1152-1155
1155 | P a g e
In this FFT processor, we adopt the in-
place computation method so as to make it a more
simplified and faster system. That’s to say, we write
the results into where they are read. In contrast to
the sequence of 64 input operands configured by
address mapping, the final output sequence of this
FFT processor are in bit-reversed order and need to
be adjusted to normal order. And we can make these
appropriate adjustments before the FFT
computation. So, although we adjust the sequence of
inputs or outputs, the performance of our FFT
processor won’t be degraded.
IV. HARDWARE RESOURCES
The functional simulation and timing
simulation are successfully made. The main
hardware resources of this design are given as
follows. The device is the EP2C70F896C6 of
Cyclone II family. The total logic elements are
562/68,416 (8%), the total pins are 563/622(91%)
and the total embedded multiplier 9-bit elements are
48/300(16%). Meanwhile, the clock frequency is
31.69MHz. As in [3], they proposed a fixed-point
l6-bit 64-point FFT processor with 92 clock cycles
in total, but our clock cycles is 67. And our FFT
processor has a higher speed and lower power
consumption.
V. CONCLUSION
This paper proposes a novel radix-2 FFT
processor based on FPGA for WLAN, using Verilog
HDL as hardware description language and Quartus
II as design and synthesis tool. To achieve high-
throughput, pipelined architectures have been used
in the butterfly unit and the dual-port RAM. The
dedicated parallel-pipelined FFT processor
architecture can process input data at high speed,
and the whole system performance can be greatly
improved due to adopting a novel simple address
mapping scheme. For radix-two system, this
mapping scheme is better and simpler than most of
others. The design is implemented on a FPGA chip.
And this pipelined FFT completes a complex 64-
point FFT within 2.1μs. The hardware testing result
explains that it can meet the requirements of the
WLAN.
REFERENCES
[1] J. A. C. Bingham, “Multicarrier modulation
for data transmission: an idea whose
time has come,” IEEE Communication
Magazine, vol. 28, no. 5, pp. 5-14, May
1990.
[2] J. Palicot and C. Roland, “FFT: a basic
function for a reconfigurable receiver,”
10th International Conference on
Telecommunications, vol. 1, pp. 898-902,
March 2003.
[3] Min Jiang, Bing Yang, Yiling Fu, et al.,
“Design Of FFT processor with Low
Power complex mutliplier for OFDM-
based high-speed wireless applications,”
International Symposium on
Communications and Information
Technology, vol. 2, pp. 639-641, Oct.
2004.
[4] Kai Zhong, Hui He, and Guangxi Zhu, “An
ultra-high speed FFT processor,”
International Symposium on Signals,
Circuits and Systems, vol. 1, pp. 37 -
40, July 2003.
[5] Hongjiang He and Hui Guo, “The
Realization of FFT Algorithm based on
FPGA Co-processor,” Second International
Symposium on Intelligent
Information Technology Application, vol.
3, pp. 239-243, Dec. 2008.
[6] J. G. Proakis and D. G. Manolakis,
“Introduction to Digital Signal Processing,”
New York: Macmillan, 1988.
[7] R. B. Perlow and T. C. Denk, “Finite word
length design for VLSI FFT processors,”
Conference Record of the Thirty-Fifth
Asilomar Conference on Signals, Systems
and Computers, vol. 2, pp. 1227– 1231,
Nov. 2001.
[8] J. W. Cooky and J. W. Tukey, “An
algorithm for the machine
calculation of complex Fourier series,”
Math. of Comp., vol. 19, No. 90, pp.
297-301, April 1965.
[9] S. Oraintara, Y. J. Chen, and T. Q. Nguyen,
“Integer fast Fourier transform,” IEEE
Trans. Acoustics, Speech, Signal
Processing, vol. 50, No. 3, pp. 607-618,
March 2002.
[10] J. H. Takala, T. S. Jarvinen, and H. T.
Sorokin, "A Conflict-free parallel memory
access scheme for FFT processors,"
International Symposium on Circuits and
Systems, vol. 4, pp. 524-527, May
2003

More Related Content

What's hot

IRJET- Implementation of Reversible Radix-2 FFT VLSI Architecture using P...
IRJET-  	  Implementation of Reversible Radix-2 FFT VLSI Architecture using P...IRJET-  	  Implementation of Reversible Radix-2 FFT VLSI Architecture using P...
IRJET- Implementation of Reversible Radix-2 FFT VLSI Architecture using P...IRJET Journal
 
IRJET - Design and Implementation of FFT using Compressor with XOR Gate Topology
IRJET - Design and Implementation of FFT using Compressor with XOR Gate TopologyIRJET - Design and Implementation of FFT using Compressor with XOR Gate Topology
IRJET - Design and Implementation of FFT using Compressor with XOR Gate TopologyIRJET Journal
 
A Pipelined Fused Processing Unit for DSP Applications
A Pipelined Fused Processing Unit for DSP ApplicationsA Pipelined Fused Processing Unit for DSP Applications
A Pipelined Fused Processing Unit for DSP Applicationsijiert bestjournal
 
Iaetsd finger print recognition by cordic algorithm and pipelined fft
Iaetsd finger print recognition by cordic algorithm and pipelined fftIaetsd finger print recognition by cordic algorithm and pipelined fft
Iaetsd finger print recognition by cordic algorithm and pipelined fftIaetsd Iaetsd
 
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORMDUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORMVLSICS Design
 
IRJET - Distributed Arithmetic Method for Complex Multiplication
IRJET -  	  Distributed Arithmetic Method for Complex MultiplicationIRJET -  	  Distributed Arithmetic Method for Complex Multiplication
IRJET - Distributed Arithmetic Method for Complex MultiplicationIRJET Journal
 
IRJET- Review Paper on Radix-2 DIT Fast Fourier Transform using Reversible Gate
IRJET- Review Paper on Radix-2 DIT Fast Fourier Transform using Reversible GateIRJET- Review Paper on Radix-2 DIT Fast Fourier Transform using Reversible Gate
IRJET- Review Paper on Radix-2 DIT Fast Fourier Transform using Reversible GateIRJET Journal
 
Design of Scalable FFT architecture for Advanced Wireless Communication Stand...
Design of Scalable FFT architecture for Advanced Wireless Communication Stand...Design of Scalable FFT architecture for Advanced Wireless Communication Stand...
Design of Scalable FFT architecture for Advanced Wireless Communication Stand...IOSRJECE
 
VTU 4TH SEM CSE MICROPROCESSORS SOLVED PAPERS OF JUNE-2014 & JUNE-2015
VTU 4TH SEM CSE MICROPROCESSORS SOLVED PAPERS OF JUNE-2014 & JUNE-2015VTU 4TH SEM CSE MICROPROCESSORS SOLVED PAPERS OF JUNE-2014 & JUNE-2015
VTU 4TH SEM CSE MICROPROCESSORS SOLVED PAPERS OF JUNE-2014 & JUNE-2015vtunotesbysree
 
Performance Analysis of OFDM Transceiver with Folded FFT and LMS Filter
Performance Analysis of OFDM Transceiver with Folded FFT and LMS FilterPerformance Analysis of OFDM Transceiver with Folded FFT and LMS Filter
Performance Analysis of OFDM Transceiver with Folded FFT and LMS Filteridescitation
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tu...
Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tu...Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tu...
Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tu...ijma
 
Parallel Hardware Implementation of Convolution using Vedic Mathematics
Parallel Hardware Implementation of Convolution using Vedic MathematicsParallel Hardware Implementation of Convolution using Vedic Mathematics
Parallel Hardware Implementation of Convolution using Vedic MathematicsIOSR Journals
 
A Configurable and Low Power Hard-Decision Viterbi Decoder in VLSI Architecture
A Configurable and Low Power Hard-Decision Viterbi Decoder in VLSI ArchitectureA Configurable and Low Power Hard-Decision Viterbi Decoder in VLSI Architecture
A Configurable and Low Power Hard-Decision Viterbi Decoder in VLSI ArchitectureIRJET Journal
 
HIGH PERFORMANCE SPLIT RADIX FFT
HIGH PERFORMANCE SPLIT RADIX FFTHIGH PERFORMANCE SPLIT RADIX FFT
HIGH PERFORMANCE SPLIT RADIX FFTAM Publications
 
FPGA Implementation of Viterbi Decoder using Hybrid Trace Back and Register E...
FPGA Implementation of Viterbi Decoder using Hybrid Trace Back and Register E...FPGA Implementation of Viterbi Decoder using Hybrid Trace Back and Register E...
FPGA Implementation of Viterbi Decoder using Hybrid Trace Back and Register E...ijsrd.com
 
Design and implementation of complex floating point processor using fpga
Design and implementation of complex floating point processor using fpgaDesign and implementation of complex floating point processor using fpga
Design and implementation of complex floating point processor using fpgaVLSICS Design
 
Braun’s Multiplier Implementation using FPGA with Bypassing Techniques.
Braun’s Multiplier Implementation using FPGA with Bypassing Techniques.Braun’s Multiplier Implementation using FPGA with Bypassing Techniques.
Braun’s Multiplier Implementation using FPGA with Bypassing Techniques.VLSICS Design
 
Reversed-Trellis Tail-Biting Convolutional Code (RT-TBCC) Decoder Architectur...
Reversed-Trellis Tail-Biting Convolutional Code (RT-TBCC) Decoder Architectur...Reversed-Trellis Tail-Biting Convolutional Code (RT-TBCC) Decoder Architectur...
Reversed-Trellis Tail-Biting Convolutional Code (RT-TBCC) Decoder Architectur...IJECEIAES
 

What's hot (20)

IRJET- Implementation of Reversible Radix-2 FFT VLSI Architecture using P...
IRJET-  	  Implementation of Reversible Radix-2 FFT VLSI Architecture using P...IRJET-  	  Implementation of Reversible Radix-2 FFT VLSI Architecture using P...
IRJET- Implementation of Reversible Radix-2 FFT VLSI Architecture using P...
 
IRJET - Design and Implementation of FFT using Compressor with XOR Gate Topology
IRJET - Design and Implementation of FFT using Compressor with XOR Gate TopologyIRJET - Design and Implementation of FFT using Compressor with XOR Gate Topology
IRJET - Design and Implementation of FFT using Compressor with XOR Gate Topology
 
Aw4102359364
Aw4102359364Aw4102359364
Aw4102359364
 
A Pipelined Fused Processing Unit for DSP Applications
A Pipelined Fused Processing Unit for DSP ApplicationsA Pipelined Fused Processing Unit for DSP Applications
A Pipelined Fused Processing Unit for DSP Applications
 
Iaetsd finger print recognition by cordic algorithm and pipelined fft
Iaetsd finger print recognition by cordic algorithm and pipelined fftIaetsd finger print recognition by cordic algorithm and pipelined fft
Iaetsd finger print recognition by cordic algorithm and pipelined fft
 
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORMDUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
 
IRJET - Distributed Arithmetic Method for Complex Multiplication
IRJET -  	  Distributed Arithmetic Method for Complex MultiplicationIRJET -  	  Distributed Arithmetic Method for Complex Multiplication
IRJET - Distributed Arithmetic Method for Complex Multiplication
 
IRJET- Review Paper on Radix-2 DIT Fast Fourier Transform using Reversible Gate
IRJET- Review Paper on Radix-2 DIT Fast Fourier Transform using Reversible GateIRJET- Review Paper on Radix-2 DIT Fast Fourier Transform using Reversible Gate
IRJET- Review Paper on Radix-2 DIT Fast Fourier Transform using Reversible Gate
 
Design of Scalable FFT architecture for Advanced Wireless Communication Stand...
Design of Scalable FFT architecture for Advanced Wireless Communication Stand...Design of Scalable FFT architecture for Advanced Wireless Communication Stand...
Design of Scalable FFT architecture for Advanced Wireless Communication Stand...
 
VTU 4TH SEM CSE MICROPROCESSORS SOLVED PAPERS OF JUNE-2014 & JUNE-2015
VTU 4TH SEM CSE MICROPROCESSORS SOLVED PAPERS OF JUNE-2014 & JUNE-2015VTU 4TH SEM CSE MICROPROCESSORS SOLVED PAPERS OF JUNE-2014 & JUNE-2015
VTU 4TH SEM CSE MICROPROCESSORS SOLVED PAPERS OF JUNE-2014 & JUNE-2015
 
Performance Analysis of OFDM Transceiver with Folded FFT and LMS Filter
Performance Analysis of OFDM Transceiver with Folded FFT and LMS FilterPerformance Analysis of OFDM Transceiver with Folded FFT and LMS Filter
Performance Analysis of OFDM Transceiver with Folded FFT and LMS Filter
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tu...
Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tu...Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tu...
Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tu...
 
Parallel Hardware Implementation of Convolution using Vedic Mathematics
Parallel Hardware Implementation of Convolution using Vedic MathematicsParallel Hardware Implementation of Convolution using Vedic Mathematics
Parallel Hardware Implementation of Convolution using Vedic Mathematics
 
A Configurable and Low Power Hard-Decision Viterbi Decoder in VLSI Architecture
A Configurable and Low Power Hard-Decision Viterbi Decoder in VLSI ArchitectureA Configurable and Low Power Hard-Decision Viterbi Decoder in VLSI Architecture
A Configurable and Low Power Hard-Decision Viterbi Decoder in VLSI Architecture
 
HIGH PERFORMANCE SPLIT RADIX FFT
HIGH PERFORMANCE SPLIT RADIX FFTHIGH PERFORMANCE SPLIT RADIX FFT
HIGH PERFORMANCE SPLIT RADIX FFT
 
FPGA Implementation of Viterbi Decoder using Hybrid Trace Back and Register E...
FPGA Implementation of Viterbi Decoder using Hybrid Trace Back and Register E...FPGA Implementation of Viterbi Decoder using Hybrid Trace Back and Register E...
FPGA Implementation of Viterbi Decoder using Hybrid Trace Back and Register E...
 
Design and implementation of complex floating point processor using fpga
Design and implementation of complex floating point processor using fpgaDesign and implementation of complex floating point processor using fpga
Design and implementation of complex floating point processor using fpga
 
Braun’s Multiplier Implementation using FPGA with Bypassing Techniques.
Braun’s Multiplier Implementation using FPGA with Bypassing Techniques.Braun’s Multiplier Implementation using FPGA with Bypassing Techniques.
Braun’s Multiplier Implementation using FPGA with Bypassing Techniques.
 
Reversed-Trellis Tail-Biting Convolutional Code (RT-TBCC) Decoder Architectur...
Reversed-Trellis Tail-Biting Convolutional Code (RT-TBCC) Decoder Architectur...Reversed-Trellis Tail-Biting Convolutional Code (RT-TBCC) Decoder Architectur...
Reversed-Trellis Tail-Biting Convolutional Code (RT-TBCC) Decoder Architectur...
 

Viewers also liked (20)

Fr3310111015
Fr3310111015Fr3310111015
Fr3310111015
 
Gv3311971210
Gv3311971210Gv3311971210
Gv3311971210
 
Gl3311371146
Gl3311371146Gl3311371146
Gl3311371146
 
Ca33467473
Ca33467473Ca33467473
Ca33467473
 
Cc33483487
Cc33483487Cc33483487
Cc33483487
 
Cf33497503
Cf33497503Cf33497503
Cf33497503
 
Fk33964970
Fk33964970Fk33964970
Fk33964970
 
In3314531459
In3314531459In3314531459
In3314531459
 
U33113118
U33113118U33113118
U33113118
 
C350712
C350712C350712
C350712
 
Hq3114621465
Hq3114621465Hq3114621465
Hq3114621465
 
Hg3113981406
Hg3113981406Hg3113981406
Hg3113981406
 
Cp34550555
Cp34550555Cp34550555
Cp34550555
 
Cc35451454
Cc35451454Cc35451454
Cc35451454
 
Bz35438440
Bz35438440Bz35438440
Bz35438440
 
Bu32451456
Bu32451456Bu32451456
Bu32451456
 
Aula 7 sist. articular estagio
Aula 7   sist. articular estagioAula 7   sist. articular estagio
Aula 7 sist. articular estagio
 
Codebeer #2
Codebeer #2Codebeer #2
Codebeer #2
 
Calidad de vida venezuela
Calidad de vida venezuelaCalidad de vida venezuela
Calidad de vida venezuela
 
Á conversa com Gorete e Cândida na aula magna!
Á conversa com Gorete e Cândida na aula magna!Á conversa com Gorete e Cândida na aula magna!
Á conversa com Gorete e Cândida na aula magna!
 

Similar to Gn3311521155

High Speed Area Efficient 8-point FFT using Vedic Multiplier
High Speed Area Efficient 8-point FFT using Vedic MultiplierHigh Speed Area Efficient 8-point FFT using Vedic Multiplier
High Speed Area Efficient 8-point FFT using Vedic MultiplierIJERA Editor
 
Nt1310 Unit 5 Algorithm
Nt1310 Unit 5 AlgorithmNt1310 Unit 5 Algorithm
Nt1310 Unit 5 AlgorithmAngie Lee
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
Design of Optimized FIR Filter Using FCSD Representation
Design  of  Optimized  FIR  Filter  Using  FCSD Representation Design  of  Optimized  FIR  Filter  Using  FCSD Representation
Design of Optimized FIR Filter Using FCSD Representation IJEEE
 
Serial interface module for ethernet based applications
Serial interface module for ethernet based applicationsSerial interface module for ethernet based applications
Serial interface module for ethernet based applicationseSAT Journals
 
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...IRJET Journal
 
Survey of Optimization of FFT processor for OFDM Receivers
Survey of Optimization of FFT processor for OFDM ReceiversSurvey of Optimization of FFT processor for OFDM Receivers
Survey of Optimization of FFT processor for OFDM Receiversijsrd.com
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Efficient FPGA implementation of high speed digital delay for wideband beamfor...
Efficient FPGA implementation of high speed digital delay for wideband beamfor...Efficient FPGA implementation of high speed digital delay for wideband beamfor...
Efficient FPGA implementation of high speed digital delay for wideband beamfor...journalBEEI
 
FPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT AlgorithmFPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT Algorithmcscpconf
 
FPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT AlgorithmFPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT Algorithmcscpconf
 
Modified Distributive Arithmetic Based DWT-IDWT Processor Design and FPGA Imp...
Modified Distributive Arithmetic Based DWT-IDWT Processor Design and FPGA Imp...Modified Distributive Arithmetic Based DWT-IDWT Processor Design and FPGA Imp...
Modified Distributive Arithmetic Based DWT-IDWT Processor Design and FPGA Imp...IOSR Journals
 
Multiple Valued Logic for Synthesis and Simulation of Digital Circuits
Multiple Valued Logic for Synthesis and Simulation of Digital CircuitsMultiple Valued Logic for Synthesis and Simulation of Digital Circuits
Multiple Valued Logic for Synthesis and Simulation of Digital CircuitsIJERA Editor
 
A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithm...
A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithm...A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithm...
A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithm...IOSR Journals
 

Similar to Gn3311521155 (20)

Design Radix-4 64-Point Pipeline FFT/IFFT Processor for Wireless Application
Design Radix-4 64-Point Pipeline FFT/IFFT Processor for Wireless ApplicationDesign Radix-4 64-Point Pipeline FFT/IFFT Processor for Wireless Application
Design Radix-4 64-Point Pipeline FFT/IFFT Processor for Wireless Application
 
High Speed Area Efficient 8-point FFT using Vedic Multiplier
High Speed Area Efficient 8-point FFT using Vedic MultiplierHigh Speed Area Efficient 8-point FFT using Vedic Multiplier
High Speed Area Efficient 8-point FFT using Vedic Multiplier
 
My paper
My paperMy paper
My paper
 
Nt1310 Unit 5 Algorithm
Nt1310 Unit 5 AlgorithmNt1310 Unit 5 Algorithm
Nt1310 Unit 5 Algorithm
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Design of Optimized FIR Filter Using FCSD Representation
Design  of  Optimized  FIR  Filter  Using  FCSD Representation Design  of  Optimized  FIR  Filter  Using  FCSD Representation
Design of Optimized FIR Filter Using FCSD Representation
 
H344250
H344250H344250
H344250
 
Serial interface module for ethernet based applications
Serial interface module for ethernet based applicationsSerial interface module for ethernet based applications
Serial interface module for ethernet based applications
 
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
 
Survey of Optimization of FFT processor for OFDM Receivers
Survey of Optimization of FFT processor for OFDM ReceiversSurvey of Optimization of FFT processor for OFDM Receivers
Survey of Optimization of FFT processor for OFDM Receivers
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
J0166875
J0166875J0166875
J0166875
 
Efficient FPGA implementation of high speed digital delay for wideband beamfor...
Efficient FPGA implementation of high speed digital delay for wideband beamfor...Efficient FPGA implementation of high speed digital delay for wideband beamfor...
Efficient FPGA implementation of high speed digital delay for wideband beamfor...
 
FPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT AlgorithmFPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT Algorithm
 
FPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT AlgorithmFPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT Algorithm
 
Noise Immune and Area Optimized Serial Interface for FPGA based Industrial In...
Noise Immune and Area Optimized Serial Interface for FPGA based Industrial In...Noise Immune and Area Optimized Serial Interface for FPGA based Industrial In...
Noise Immune and Area Optimized Serial Interface for FPGA based Industrial In...
 
Modified Distributive Arithmetic Based DWT-IDWT Processor Design and FPGA Imp...
Modified Distributive Arithmetic Based DWT-IDWT Processor Design and FPGA Imp...Modified Distributive Arithmetic Based DWT-IDWT Processor Design and FPGA Imp...
Modified Distributive Arithmetic Based DWT-IDWT Processor Design and FPGA Imp...
 
20120130406009 2
20120130406009 220120130406009 2
20120130406009 2
 
Multiple Valued Logic for Synthesis and Simulation of Digital Circuits
Multiple Valued Logic for Synthesis and Simulation of Digital CircuitsMultiple Valued Logic for Synthesis and Simulation of Digital Circuits
Multiple Valued Logic for Synthesis and Simulation of Digital Circuits
 
A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithm...
A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithm...A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithm...
A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithm...
 

Recently uploaded

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 

Recently uploaded (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 

Gn3311521155

  • 1. D.Venkata Kishore, C.Ram Kumar / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 3, May-Jun 2013, pp.1152-1155 1152 | P a g e Design and Implementation of Pipelined FFT Processor *D.Venkata Kishore, **C.Ram Kumar *Lecturer, ECE Department, JNTUA College of Engineering Pulivendula **Lecturer, ECE Department, JNTUA College of Engineering Pulivendula Abstract It is important to develop a high- performance FFT processor to meet the requirements of real time and low cost in many different systems. So a radix-2 pipelined FFT processor based on Field Programmable Gate Array (FPGA) for Wireless Local Area Networks (WLAN) is proposed. Unlike being stored in the traditional ROM, the twiddle factors in our pipelined FFT processor can be accessed directly. A novel simple address mapping scheme is also proposed. The FFT processor has two pipelines, one is in the execution of complex multiplication of the butterfly unit, and the other is between the RAM modules, which read input data, store temporary variables of butterfly unit and output the final results. Finally, the pipelined 64-point FFT processor can be completely implemented within only 67 clock cycles. Keywords-FFT; FPGA; address mapping I. INTRODUCTION Fast Fourier Transform (FFT) processor is widely used in different applications, such as WLAN, image process, spectrum measurements, radar and multimedia communication services [1]. However, the FFT algorithm is a demanding task and it must be precisely designed to get an efficient implementation. If the FFT processor is made flexible and fast enough, a portable device equipped with wireless transmission system is feasible. Therefore, an efficient FFT processor is required for real-time operations [2] and designing a fast FFT processor is a matter of great significance. In the past twenty years, FPGA has developed rapidly and gradually become universal. Compared with design flow of traditional ASIC, designs based on FPGA have the advantages of flexibility and high performance price ratio. Many researchers have studied on pipelined FFT based on FPGA [3], [4], [5]. For instance, in [3], they proposed an approach to design an FFT processor for wireless applications, but his design has too many clock cycles and isn’t fast enough. In comparison to their designs, we propose a simple and feasible pipelined implementation of a 32-bit 64-point FFT processor based on FPGA for WLAN. This paper is organized as follows. In the next section, we introduce a basic radix-2 FFT algorithm to briefly discuss which decimation is better to the system, a three-multiplication method, and a novel address mapping scheme which reduces delay and increases the speed of the system. In section III, the pipelined FFT architecture is proposed and each unit is also illustrated. Section IV is the implementation of the 64-point FFT processor based on FPGA, and hardware resources are explicitly listed out. The last section gives the conclusion. II. FFT ALGORITHM AND ADDRESS MAPPING SCHEME A. Radix-2 FFT Algorithm The FFT algorithm can compute the Discrete Fourier Transform (DFT) effectively. Given a sequence {x(n)} of N complex numbers, we can compute its DFT, another sequence {X(k)} of N complex numbers, according to the following formula [6] N-1 N X(k) = ∑ x(n)W nk , k= 0,1,……,N-1. (1) n = 0 And according to the different way to decimate, it can be divided into two types, DIF (Decimation in Frequency) and DIT (Decimation in Time). The DIF algorithm is easier to design than DIT. And considering the finite word length effect, DIF has much more advantages than DIT, such as reducing the additive noise, which is introduced by the multiplication when it is implemented with the fixed point [7] and reducing the complexity of the whole system. Consequently, we use the DIF algorithm to design radix-2 FFT module and most of current FFT processors are also based on this algorithm [8]. B. Three-multiplication Method It’s undeniable that complex multiplication is the dominant factor affecting the speed and the throughput of FFT processor. Computing a complex multiplication requires four real multipliers and two real adders. As we all know, the hardware area of a real multiplier is larger than that of a real adder in FPGA. So we should do our best to convert the complex multiplication into addition and subtraction to optimize the whole performance as high as possible. Having taken into account all operands are 32-bit complex numbers, the difference of two
  • 2. D.Venkata Kishore, C.Ram Kumar / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 3, May-Jun 2013, pp.1152-1155 1153 | P a g e inputs X m (i) and X m ( j) can be expressed by z1 = x1 + jy1 and the twiddle factor WN ek = exp(2n π)can be expressed by N z 2 = x2 + jy 2 . So the product of them can be also expressed by z = x + jy . That’s to say, the 16-bit real part x of the product is equivalent to x1 x2 − y1 y2 and the 16-bit imaginary part y is equivalent to x1 y 2 + x2 y1 . Therefore, we can transform the product z easily as the following equations x = x1 ( x 2 + y 2 ) − y 2 ( x1 + y1 ), (2) y = x1 ( x 2 + y 2 ) − x 2 ( x1 − y1 ). (3) Obviously, using this factorization scheme, the system has some advantages [9]. The number of real multiplications is reduced from four to three. And addition has less consumption than multiplication. So the system power consumption is also reduced. In this study, we can save sixteen embedded multiplier 9-bit elements in FPGA. As for this 64-point FFT processor, the numerical values of x2 + y2 , x1 − y1 , x1 + y1 , x2 and y2 can be gotten before they participate in the real multiplication. C. The Novel Address Mapping Scheme In this paper, the block size of our system is 64 points. Having considered the properties of radix-2 64-point FFT, it needs to read 8 operands from memories at a time so as to achieve a high- speed FFT. As we all know, parallel accessing data is crucial to a system [10]. Thus, these 8 operands in our design are located in different row or column of memory blocks and this arrangement ensures that 8 conflict-free memory accesses can be performed in parallel. Initially, we use 8 32-bit dual-port memories to store 64 operands in sequence. And then a new linear shift conflict-free address mapping scheme is adopted to change the addresses of operands. The primary two-dimensional addresses of operands will be mapped to new ones. For instance, we assume that the original two-dimensional coordinate is (a, b), in which a and b represent the address of the data in one memory and the number of the 8 memories, respectively. Then, we obtain a new conflict-free address (A, B) by means of the following equations A=b, (4) B=(a+b)%8. (5) In our design, it can be ensured that no memory location is read from or written to at the same time, and this new mapping scheme is feasible, effective and simple. III. THE PIPELINED FFT PROCESSOR ARCHITECTURE For high throughput systems, pipelined architecture is a good choice, and it is also an ideal method to implement high-speed long-size FFT owing to its regular structure and simple control. The performance of pipelined FFT processor can be improved by optimizing the structure and saving hardware resources. The block diagram of our proposed FFT processor is illustrated in Fig.1. It consists of four essential units. Control unit, the kernel of the FFT processor, harmonizes the whole system. Butterfly unit (BU), which has three-stage pipelined structure, carries out the complex multiplication. Two dual-port RAMs are used to store and output data. And AGU, the abbreviation Fig 1. The pipelined FFT architecture. of address generator unit, produces 8 3-bit read addresses and write addresses. A. Control Unit Control unit, which generates all control signals for the whole system, is responsible for operation control of the processor. A 48-bit signal w_con controls the whole FFT processor. And this signal w_con generates two parameters, write_en and read_en, to control AGU. It also generates sel1 and sel2 signals to select data from two RAMs, each of which is made up of 8 32-bit registers. The BU and the remaining parts are controlled by w_con as well. This control unit harmonizes all steps of the FFT processor based on a 7-bit counter. B. The DIF Butterfly Unit For FFT algorithm, the central component is the BU that calculates the sum and difference of two input data, and plays an extremely important role in computing the product of the difference and twiddle factors.
  • 3. D.Venkata Kishore, C.Ram Kumar / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 3, May-Jun 2013, pp.1152-1155 1154 | P a g e We only use 11 factors W0 64 , W1 64 , W2 64 , W3 64 , W4 64, W5 64 , W6 64 , W7 64 , W8 64 , W16 64 , W24 64 to express all 32 factors that we need in this FFT processor owing to the nk fact that the twiddle factor W N can be separated into two components. For instance, W24 64 can be derived from the product of W4 64 and W24 64 . Moreover, the value of W0 64 is a constant 1, and for W16 64 , only a negative sign is needed to add to the real part of the relevant data, and then to inverse the real part and imaginary part. So we can eliminate these two factors, that’s to say, actually we just need 9 twiddle factors. In addition, we use 16-bit fixed point decimal to express these 9 twiddle factors. Although the fixed point decimal arithmetic isn’t precise enough, it can satisfy the requirements of general systems. Fig 2. Block diagram of BU. Due to the fact that the multiplication of twiddle factors and corresponding data is very important, three-stage pipeline structure is used for the complex multiplication to obtain a high speed computation. The architecture of BU is shown in Fig. 2. In the BU, both of the complex inputs are 32 bits, including 16-bit real part and 16-bit imaginary part. The sum of them needs to be scaled down by a factor of 2 to avoid arithmetic overflow, and the same operation is applied to the difference of them. On the other hand, the factor W0 64 and the first parameter con_s of the second multiplexer are not involved in the complex multiplier, and they can be used as a constant 1, just as Fig.2 has depicted. Thus, the power consumption of the complex multiplier can be reduced and the hardware resources will be saved. There’re some points to be emphasized. The difference and twiddle factors are both 32 bits, so the result of the first complex multiplier will be 64 bits. But because we adopt the fixed point decimal computation, we should intercept it to a 32-bit parameter f_mult as the input of the second complex multiplier. The most remarkable advantage in this unit is that we use 3 32-bit registers to realize the three- stage pipeline of butterfly transform, using register1 to store the difference, register2 to store the intercepted result of second stage f_mult and register3 to store the final result cmul_b. C. The RAM Unit The RAM1 and RAM2 are made up of 8 32-bit registers respectively. And data is always written to the outside memories from RAM2, and it is always read to RAM1 from the outside memories. Then let introduce the key algorithm used in this unit. Considering the properties of 64-point FFT, we can use the radix-2 DIF 8-point FFT as a whole unit, so there Fig 3 The example of radix-2 8-point FFT. are only two stages to accomplish the 64-point FFT. And these two stages are identical to the six stages of the standard radix-2 DIF 64-point FFT. System parallel reads 8 32-bit operands from outer memories to RAM1 at a time, and we need only read sixteen times. An example of this algorithm is shown in Fig.3. The pipeline in RAM units is briefly discussed as follows. Firstly, system reads 8 operands from outer memories and writes them to RAM1, and then the results will be stored in RAM2 after they are computed. Meanwhile, system reads the next 8 operands. Subsequently, system will access operands from RAM1 or RAM2 to compute these 16 operands. Furthermore, within a single clock four butterfly computations will be executed simultaneously. The final results of these 16 operands will be all stored in RAM2, and then they are written to outer memories. At the same time, another 8 operands will be read. Accordingly, only 76 clock cycles are needed to complete this radix-2 DIF 64-point FFT. D. AGU Compared with other units, AGU is also quite important. It will create 8 read and 8 write addresses, which determine the data access to outer memories.
  • 4. D.Venkata Kishore, C.Ram Kumar / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 3, May-Jun 2013, pp.1152-1155 1155 | P a g e In this FFT processor, we adopt the in- place computation method so as to make it a more simplified and faster system. That’s to say, we write the results into where they are read. In contrast to the sequence of 64 input operands configured by address mapping, the final output sequence of this FFT processor are in bit-reversed order and need to be adjusted to normal order. And we can make these appropriate adjustments before the FFT computation. So, although we adjust the sequence of inputs or outputs, the performance of our FFT processor won’t be degraded. IV. HARDWARE RESOURCES The functional simulation and timing simulation are successfully made. The main hardware resources of this design are given as follows. The device is the EP2C70F896C6 of Cyclone II family. The total logic elements are 562/68,416 (8%), the total pins are 563/622(91%) and the total embedded multiplier 9-bit elements are 48/300(16%). Meanwhile, the clock frequency is 31.69MHz. As in [3], they proposed a fixed-point l6-bit 64-point FFT processor with 92 clock cycles in total, but our clock cycles is 67. And our FFT processor has a higher speed and lower power consumption. V. CONCLUSION This paper proposes a novel radix-2 FFT processor based on FPGA for WLAN, using Verilog HDL as hardware description language and Quartus II as design and synthesis tool. To achieve high- throughput, pipelined architectures have been used in the butterfly unit and the dual-port RAM. The dedicated parallel-pipelined FFT processor architecture can process input data at high speed, and the whole system performance can be greatly improved due to adopting a novel simple address mapping scheme. For radix-two system, this mapping scheme is better and simpler than most of others. The design is implemented on a FPGA chip. And this pipelined FFT completes a complex 64- point FFT within 2.1μs. The hardware testing result explains that it can meet the requirements of the WLAN. REFERENCES [1] J. A. C. Bingham, “Multicarrier modulation for data transmission: an idea whose time has come,” IEEE Communication Magazine, vol. 28, no. 5, pp. 5-14, May 1990. [2] J. Palicot and C. Roland, “FFT: a basic function for a reconfigurable receiver,” 10th International Conference on Telecommunications, vol. 1, pp. 898-902, March 2003. [3] Min Jiang, Bing Yang, Yiling Fu, et al., “Design Of FFT processor with Low Power complex mutliplier for OFDM- based high-speed wireless applications,” International Symposium on Communications and Information Technology, vol. 2, pp. 639-641, Oct. 2004. [4] Kai Zhong, Hui He, and Guangxi Zhu, “An ultra-high speed FFT processor,” International Symposium on Signals, Circuits and Systems, vol. 1, pp. 37 - 40, July 2003. [5] Hongjiang He and Hui Guo, “The Realization of FFT Algorithm based on FPGA Co-processor,” Second International Symposium on Intelligent Information Technology Application, vol. 3, pp. 239-243, Dec. 2008. [6] J. G. Proakis and D. G. Manolakis, “Introduction to Digital Signal Processing,” New York: Macmillan, 1988. [7] R. B. Perlow and T. C. Denk, “Finite word length design for VLSI FFT processors,” Conference Record of the Thirty-Fifth Asilomar Conference on Signals, Systems and Computers, vol. 2, pp. 1227– 1231, Nov. 2001. [8] J. W. Cooky and J. W. Tukey, “An algorithm for the machine calculation of complex Fourier series,” Math. of Comp., vol. 19, No. 90, pp. 297-301, April 1965. [9] S. Oraintara, Y. J. Chen, and T. Q. Nguyen, “Integer fast Fourier transform,” IEEE Trans. Acoustics, Speech, Signal Processing, vol. 50, No. 3, pp. 607-618, March 2002. [10] J. H. Takala, T. S. Jarvinen, and H. T. Sorokin, "A Conflict-free parallel memory access scheme for FFT processors," International Symposium on Circuits and Systems, vol. 4, pp. 524-527, May 2003