SlideShare une entreprise Scribd logo
1  sur  8
Télécharger pour lire hors ligne
TELKOMNIKA, Vol.16, No.2, April 2018, pp. 463~470
ISSN: 1693-6930, accredited A by DIKTI, Decree No: 58/DIKTI/Kep/2013
DOI: 10.12928/TELKOMNIKA.v16i2.4153 ◼ 463
Received November 5, 2016; Revised December 19, 2016; Accepted March 12, 2018
Research of 64-bits RISC Dual-core Microprocessor
with High Performance and Low Power Consumption
Gang Zou*, Zhibiao Shao, Linghao Li
Electron and Information Engineering College, Xi’an Jiaotong University
No. 28, Xianning West Road, Xi'an, Shaanxi, 710049, P.R. China
*Corresponding author, e-mail: 99887406@qq.com
Abstract
A 64-bits RISC Dual-Core microprocessor with high performance and low power consumption is
presented in this paper. The processor has a symmetric architecture with two cores. Each of them has
three stage pipeline, 64-bit data-path and 64-bit address port. A novel shared register module, redundant
Booth3 algorithm and leapfrog Wallace tree architecture are introduced to the microprocessor, and both
the performance and power consumption of it has been improved enormously. As the FPGA simulation
result indicates, the power consumption is decreased by 14% and the longest data-path is shortened by
25%.
Keywords: Dual-core, Booth algorithm, Wallace tree
Copyright © 2018 Universitas Ahmad Dahlan. All rights reserved.
1. Introduction
According to the microprocessor processing ability,instruction and data length of
microprocessor gradually growth from 32bits to 64bits to adapt to the growing computational
power requirements [1]. According to the microprocessor overall architecture, on the one hand,
RISC Reduced Instruction Set Computing) became the main trend of microprocessor overall
architecture by replacing CISC Complex Instruction Set Computer) because of its concise
instruction system and achieve efficiency. On the other hand, multi-core microprocessor
became the focus of microprocessor research and commercial areas by replacing single core to
avoid power consumption and problems caused by line width tend to ultimate limit and enhance
of integration level. In general, the design of microprocessor tend to long word length, Reduced
Instruction Set and single chip multi-core.
This paper describes a design of 64-bits RISC Dual-Core microprocessor. A simple and
effective architecture and control mechanism of Dual-Core was achieved by using a novel
shared register module. A high processing speed was achieved by using a redundant Booth3
algorithm. The Dynamic power consumption was decreased by using leapfrog Wallace tree
architecture. This microprocessor satisfies the needs of high performance and low power
consumption application.
2. Architec Ture of Dual-core
The structure of symmetric Dual-Core microprocessor designed in this paper is shown
in Figure 1. And the structure of single-core in chip is shown in Figure 2. Resource sharing and
data exchanging was effective achieved by using shared register module. Sequences of
instructions have fixed order for Single task program, according the order of reading and writing
a same register, so there are four kinds of data race as shown after (Assuming there are two
instructions named i and j, and instruction i fetching before instruction j r).
◼ ISSN: 1693-6930
TELKOMNIKA Vol. 16, No. 2, April 2018 : 463 – 470
464
Figure 1. Architecture of dual-core
Correlation of RAW (Read After Write). Instruction j read source register Rx but
instruction i does not write result in destination register Rx. Therefore instruction j get a expired
operand which is a wrong operand.
DecodingUnit&
ControllingUnit
Multiplexer &
Address Latch
Address
Adder
Output
Enable
AddressPort
64bits
PC Register
Registers
Division
Register
BusPreselection
Unit
Arithmetic
Logic
Multipler
Field
Compression
BusArbitrationUnit
DataOutputPort64bits
Barrel Shifter
JtagUnit
Instruction
Register
Multiple
xer
Instruction
Cache
Field Extraction
Field Expansion
Data Output
Latch
DataInputPort64bits
NM
INT
CLK
RST
OE
WE
C
BUS
B
BUSA
BUS
Temporary Storage
of Multiplexer Data
Figure 2. Architecture of single-core
Correlation of WAW (Write After Write). Instruction i and instruction j write a same
destination register Rx but the writing operating of instruction j is earlier or at least the same
time than instruction i. Therefore we have a wrong order of writing operation which lead to that
the value of destination register Rx become a indeterminate state or come from instruction i
instead of instruction j.
Correlation of WAR (Write After Read). Instruction j write result in destination register
Rx before instruction i read source register Rx. Therefore instruction i get a new operand which
pipeline controlling
Core I Core II
Shared
Reg
Stack
I/O Port
Shared Cache In Chip
TELKOMNIKA ISSN: 1693-6930 ◼
Research of 64-bits RISC Dual-Core Microprocessor with High Performance... (Gang Zou)
465
is a wrong operand. Correlation of RAR (Read After Read). Instruction i and Instruction j read
from a same source register Rx. Apparently, this situation does not bring to data race.
Figure 3. RAW correlation
As shown above, all four kinds of correlations are caused by some operation on a same
register. It is obviously that RAR correlation does not lead to data race. Fortunately, we don’t
need to handle WAR correlation. This Dual-Core has a sequential instruction issue strategy, and
each pipeline of two single-core read source operand in instruction decoding level and write
destination operand in execution level. This means reading source operand occur definitely
before writing destination operand and naturally does not lead to WAR correlation. The
instruction execution order of this Dual-Core is the same with program order because we use
sequential instruction issue strategy. Therefore, we can avoid WAW correlation just by writing
the new result of instruction j in destination register when the two adjacent instructions write the
same register. So the correlation we need to handle is only RAW.
Solution of RAW data race is shown as behind. There are two kinds of RAW correlation
according to the right value that will write in the register may be produced or not when
instruction j read destination operand in decoding level. If the right value has been produced,
there is no correlation between dual-core because both of single core does not come in
execution level. Therefore we just need to handle with RAW correlation that the right value does
not be produced when instruction j read destination operand in instruction decoding level.
As shown in Figure 3, assuming that there is RAW correlation between instruction M
and M+1 which means the destination operand of instruction M is the source operand of
instruction M+1. In the first period, core I get instruction M in fetching level, and at the same
time core II get instruction M+1. In the second period, core I and core II get instruction M+2 and
M+3 in fetching level, and at the same time the RAW correlation be detected in decoding level.
At this time, we can clear instruction register in negative edge to stop decoding of instruction
M+2 and M+3. In the third period, core I get instruction M+1 in fetching level, and at the same
time core II get instruction M+2. When instruction M+1 and M+2 was decoding, instruction M
has been executing. That is to say, correlation was handled because destination operand has
been produced.
3. Design of High-speed Algorithm and Low-power Architecture
Multiplier is one of the most important parts of this chip, and lies within the critical path.
Therefore, to a great extent, it is the key element of the whole Dual-Core system performance.
The Booth algorithm is a popular way to reduce the number of the partial products by
recoding the multiplier, while the Wallace tree architecture is an efficient method to compress
the partial products with short carry-in delay. Both of them are widely applied to improve the
performance of the multiplier, such as the speed, the power consumption and etc. However, the
traditional Booth algorithm has to process the tripling-partial product, which increase the critical
path so that it decelerates the multiplier. While the traditional Wallace tree architecture could not
Instruction
Fetching
Instruction
Decoding
Instruction
Executing
Instruction
Fetching
X X
Instruction
Fetching
X X
Instruction
Fetching
Instruction
Decoding
Instruction
Executing
Instruction
Fetching
Instruction
Decoding
Instruction
Executing
Instruction
Fetching
Instruction
Decoding
Instruction
Executing
Instruction
M, Core I
Instruction
M+1, Core II
Instruction
M+2, Core I
Instruction
M+3, Core II
Instruction
M+1, Core I
Instruction
M+2, Core II
◼ ISSN: 1693-6930
TELKOMNIKA Vol. 16, No. 2, April 2018 : 463 – 470
466
generate the carry and the pseudo sum synchronously, which takes unnecessary 0-1 jumps and
redundant dynamic power consumption.
This paper presents a novel design of the Wallace binary multiplier. It proposed a
redundant Booth3 algorithm to avoid the difficulty of generating the tripling-partial product, while
it also presents a novel leapfrog Wallace tree architecture to generate the carry and the pseudo
sum synchronously, which puts an end to the unnecessary 0-1jumps and improves the power
consumption of the multiplier. Such improvements are used in multiplier for testing and
simulation. The simulation results show that the improvements are effective to improve the
performance and decrease the power consumption of multiplier.
3.1. A Redundant Booth3 Multiplier for High Speed
A redundant Booth3 algorithm is studied in this paper to improve multiplier’s speed,
because it is the important way to improve the system performance.
For the Booth3 algorithm, an n-bits binary data X is recoded by the blocks with 4 bits
scanning digit xi+2xi+1xixi-1, which is based on the value of (-4xi+2+xi+1+xi+xi-1). Ignoring the
superposed bits between the blocks, the recoding operation uses every three bits of multiplier,
instead of every two of them in the Booth2 algorithm, to generate one partial product in order to
reduce the number of partial product and increase the speed. For the 64-bits multiplier, a zero
bit is posed after the least significant bit, while a sign bit is posed before the most significant bit
respectively. Then, the binary digit is recoded by the blocks and generates the partial product.
The partial product should be selected in the order {0, M, 2M, 3M}, where the M stands for the
multiplicand. Every partial product should shift 3 bits left or right than the one before it. Whether
left or right is decided by the recoding sequence: the big-endian sequence makes the partial
product to shift right, while the little-endian sequence makes the partial product to shift left. For
the binary multiplication algorithm, the 0M and the 1M can be gotten directly, while the 2M and
4M can be gotten by shifting the multiplicand one bit or two bits left respectively. But the
situation of 3M is not as easy as others. It could not be calculated directly as 2M+M for the long
delay of a carry-in adder, especially while the multiplicand has a long bit-width. And it could not
be calculated by shifting either.
The 64-bits multiplier that proposed in this chip chooses the little-endian sequence, and
the number of partial product reduces to 22 than 33. For the partial product of 3M, an adder
composed with 4-bits adders that operate in parallel is adopted, and the carry-in signal does not
transfer among the adders but forms another partial product. There are 8 carry-in signals for
each partial product, and 22 partial products have 176 carry-in signals. For every certain weight,
there are 6 carry-in signals at most, therefore, all of the 176 carry-in signal can be compressed
into 6 partial products, which modifies the carry-in delay for the 3M generation to the 4-bits
chain carry-in delay instead of a 64-bits long chain carry-in delay. However, that also introduces
6 more partial products. Its infection will be discussed in the next paragraph.
For the algorithm of the complement code, every partial product should extend the sign
bit to the most significant bit increases the operating data amount. To avoid the extension, a
method that a certain digit was introduced as another partial product which is called the
counteractive digit to counteract the missing sign bits was applied. Figure 4 shows the operating
data amount for the shift-add algorithm, where there are 64+65+. . . +126+127=6112 bits data in
the Figure 4. The Booth2 algorithm has to deal with 67+69+. . . +129+131=3267 bits data, as it
is shown in Figure 5. Figure 6 betrays the operating data amount as 66×22+128+176=1756 bits
for the method mentioned in this paper. All of the three figures are based on 64-bits multiplier.
Obviously, the third method, bases on the Booth3 algorithm, has much less data amount to
operate, which is good for realizing a 64-bits high speed multiplier.
TELKOMNIKA ISSN: 1693-6930 ◼
Research of 64-bits RISC Dual-Core Microprocessor with High Performance... (Gang Zou)
467
Figure 4. The complement code algorithm Figure 5. The booth2 algorithm
Figure 6. The modified booth3 algorithm
The synthesis results show that the modified Booth3 algorithm accelerates 23%
compared with the traditional Booth2 algorithm, both of which are synthesized in 0.18um CMOS
techniques and 50MHz.
3.2. A Leapfrog Wallace Architecture for Low Power
The power consumption of the integrated circuit are mainly from the charging and
discharging current of load capacitance, which is the dynamic power
P=1/2CV2ESWFCLK
As shown above, C indicates load capacitance; V indicates power supply voltage; ESW
indicates Jump frequency; FCLK indicates service frequency. Power consumption can be
reduced by decreasing the number of jumping logic cells with the same chip technology power
supply voltage and service frequency.
Wallace tree is theoretically the fastest adder tree for multiplications. However, the carry
c and pseudo sum s has different generation time so that the carry which has faster generation
speed must wait for the generation of the pseudo sum. As it is depicted in Figure 7, the pseudo
sum s has 6 tds, while the carry c has 4 tds if the td stands for a complex logic gate delay. The
carry should waits 2 tds for the pseudo sum generation in the traditional Wallace tree. It delays
the compression speed, and takes unnecessary 0-1 jumps which increases the power
consumption of the compressor. Furthermore, the irregularity of the Wallace tree architecture
increases the wiring delay and area.
sign
extension
partial
products
126
126
63
63
0
64
partial
products
sign
extension
partial
products
130
130
65
64
0
33
partial
products
counteractive digit
carry-in signal
carry-in signal
carry-in signal
carry-in signal
carry-in signal
carry-in signal
22
partial
products
partial
products
065
63128
◼ ISSN: 1693-6930
TELKOMNIKA Vol. 16, No. 2, April 2018 : 463 – 470
468
Figure 7. The traditional 4-2 compressor
A new Wallace tree architecture is presented in this paper to resolve the disadvantages
presented above of the traditional one, which is named as the leapfrog Wallace tree and shown
in Figure 8-B. It generates the carry and pseudo sum synchronously by the leapfrog connection,
which increases the compression speed of the partial products and avoids the unnecessary 0-1
jumps of the traditional one to decrease the instantaneous power consumption. The traditional
Wallace tree architecture is depicted in Figure 8-A. Comparing with it, the leapfrog Wallace tree
has 2 tds advantage in the critical path delay if the wiring delay is not considered. However, the
wiring delay takes most parts of the delay in deep submicron CMOS techniques. The
architecture of leapfrog Wallace tree has much more regularity and shorter wires. Therefore, it
takes shorter wiring delay and lower instantaneous power.
Comparing with the traditional Wallace tree, the synthesis results show that the leapfrog
Wallace tree accelerates 26% and lowers 20% power consumption, and decreases 13% area,
both of which are synthesized in 0.18 CMOS techniques and 100Mhz.
A. Traditional Wallace Tree B. Salutatory Wallace Tree
Figure 8. The architecture comparision of the Wallace Trees
Figure 9 and Figure 10 show the differences of the instantaneous power consumption
between the two architectures of the Wallace trees. Both of the results are generated by the
same test-vectors. It is obviously that the leapfrog Wallace tree has only one peak in the
instantaneous power picture while the traditional one has almost three. That means the leapfrog
Wallace tree consumes less instantaneous power and is much more power-effective than the
traditional one when both of them works under the same situation. The simulation results
accords perfectly with the result of the previous theoretical analysis that the leapfrog Wallace
tree architecture avoids the unnecessary 0-1 jumps to improve the instantaneous power
efficiency.
TELKOMNIKA ISSN: 1693-6930 ◼
Research of 64-bits RISC Dual-Core Microprocessor with High Performance... (Gang Zou)
469
Figure 9. Instantaneous power of traditional
Wallace Tree
Figure 10. Instantaneous power of leapfrog
Wallace Tree
4. Synthesis and Simulation
4.1. Synthesis with the Synopsys Design Compiler
Both of the previous 64-bits Dual-Core microprocessor and the new 64-bits Dual-Core
microprocessor with the novel shared register model, the novel Booth multiplier and the novel
Wallace tree architecture was synthesized with the 0.18um CMOS library by the Synopsys
Design Compiler. The synthesis results show that the new Dual-Core microprocessor has about
9.637ns worst case critical path delay under the 0.18μm CMOS technology, while the previous
design take almost 13.187ns worst case critical path delay under the same CMOS library. It
takes about 26.9% advantage in speed than the previous design.
The power consumption of the new design that reported by the Design Compiler is
1204.38mw at 50Mhz, while the power-consumption of the previous design is 1400.44mw at
50Mhz. Therefore, the new 64-bits Dual-Core microprocessor with the modified Booth3
algorithm and completely parallel Wallace tree structure saves about 14% power consumption
at 50Mhz by using the same 0.18um CMOS technology than the previous design based on
Booth2 algorithm and traditional Wallace Tree.
4.2. FPGA Simulation Result
The 64-bits Dual-Core microprocessor is simulated on the Altera Stratix III
EP3SL150F780C4N FPGA device. The Quartus II 12.1sp1 was used to generate the simulation
result. The power reports show that the new 64-bits Dual-Core microprocessor with the novel
Booth multiplier and the novel Wallace tree architecture is only 1269.48mW when it works at
50MHz. In the other hand, the previous 64-bits Dual-Core microprocessor is 1447.72mW when
it also works at 50MHz. The new improvements make the 64-bits Dual-Core microprocessor to
save almost 14% power consumption at 50MHz than the previous design, and this result
coincides with the result of synthesizing by Synopsys Design Compiler.
The FPGA simulation also generates the maximum pad to pad delay. The maximum
pad to pad delay for the new 64-bits Dual-Core microprocessor is 14.751ns, while the maximum
pad to pad delay for the previous design is almost 19.574ns. The new 64-bits Dual-Core
microprocessor takes almost 25% advantages in speed than the previous design. Its frequency
is up to 66.6Mhz on the EP3SL150F780C4N FPGA device.
5. Conclusion
A 64-bits Dual-Core microprocessor is proposed in this paper. Its architecture is based
on the novel shared register model. Its performance is improved by modifying Booth3 algorithm
and its power consumption is optimized by completely parallel Wallace tree structure. The
◼ ISSN: 1693-6930
TELKOMNIKA Vol. 16, No. 2, April 2018 : 463 – 470
470
simulation results indicate that the power consumption is decreased by 14% and the longest
data-path is shortened by 25% compared with the previous design.
References
[1] M Schoeberl, S Abbaspour, B Akesson, N Audsley, R Capasso. T-CREST: Time-predictable multi-
core architecture for embedded systems. Journal of Systems Architecture. 2015, 61(9): 449-471.
[2] S Pagani, JJ Chen, M Li. Energy Efficiency on Multi-Core Architectures with Multiple Voltage Islands.
IEEE Transactions on Parallel & Distributed Systems. 2015, 26(6): 1-1.
[3] T Pimpalkhute, S Pasricha. NoC Scheduling for Improved Application-Aware and Memory-Aware
Transfers in Multi-core Systems. Proceedings of the 2014 27th International Conference on VLSI
Design and 2014 13th International Conference on Embedded Systems. 2015, 26(6).
[4] Cheng YL, Min CH, Rong GC. Instruction scheduling and transformation for a VLIW unified reduced
instruction set computer/digital signal processor processor with shared register architecture.
Concurrency and Computation: Practice and Experience. 2014; 26(1): 134–151.
[5] J Zhanga, S Youb, L Gruenwaldc. Parallel online spatial and temporal aggregations on multi-core
CPUs and many-core GPUs. Information Systems. 2014; 44: 134–154.
[6] Z Yang, FP Wu, JR Dong, RD Heng. Optimization of Power System Scheduling Based on Shuffled
Complex Evolution Metropolis Algorithm. TELKOMNIKA (Telecommunication Computing Electronics
and Control). 2015; 13(2): 413-420.
[7] Ravi N, Subbaiah Y, Prasad TJ, et al. A novel low power, low area array multiplier design for DSP
applications. Signal Processing, Communication, Computing and Networking Technologies
(ICSCCN), 2011 International Conference on. IEEE. 2011: 254-257.
[8] Sivanantham, S Padmavathy, M Divyanga, S Lincy, PV Anitha. System-On-a-Chip Test Data
Compression and Decompression with Reconfigurable Serial Multiplier. International Journal of
Engineering & Technology. 2013; 5(2): 973.
[9] SK Chen, CW Liu, TY Wu. Design and Implementation of High-Speed and Energy-Efficient Variable-
Latency Speculating Booth Multiplier (VLSBM). Circuits and Systems I: Regular Papers, IEEE
Transactions on. 2013; 60(10).
[10] AZ Jidin, T Sutikno. FPGA Implementation of Low-Area Square Root Calculator. TELKOMNIKA
(Telecommunication Computing Electronics and Control). 2015; 13(4): 1145-1152.
[11] A Sathya, S Fathimabee, S Divya. Parallel multiplier-accumulator based on radix-2 modified Booth
algorithm by using a VLSI architecture. Electronics and Communication Systems (ICECS), 2014
International Conference on 13-14 Feb. 2014.

Contenu connexe

Tendances

Final Project Report
Final Project ReportFinal Project Report
Final Project ReportRiddhi Shah
 
Assignement 3 ADV report (1)
Assignement 3 ADV report (1)Assignement 3 ADV report (1)
Assignement 3 ADV report (1)Riddhi Shah
 
IRJET- Review Paper on Radix-2 DIT Fast Fourier Transform using Reversible Gate
IRJET- Review Paper on Radix-2 DIT Fast Fourier Transform using Reversible GateIRJET- Review Paper on Radix-2 DIT Fast Fourier Transform using Reversible Gate
IRJET- Review Paper on Radix-2 DIT Fast Fourier Transform using Reversible GateIRJET Journal
 
Multiple Valued Logic for Synthesis and Simulation of Digital Circuits
Multiple Valued Logic for Synthesis and Simulation of Digital CircuitsMultiple Valued Logic for Synthesis and Simulation of Digital Circuits
Multiple Valued Logic for Synthesis and Simulation of Digital CircuitsIJERA Editor
 
High Speed Signed multiplier for Digital Signal Processing Applications
High Speed Signed multiplier for Digital Signal Processing ApplicationsHigh Speed Signed multiplier for Digital Signal Processing Applications
High Speed Signed multiplier for Digital Signal Processing ApplicationsIOSR Journals
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Fpga based efficient multiplier for image processing applications using recur...
Fpga based efficient multiplier for image processing applications using recur...Fpga based efficient multiplier for image processing applications using recur...
Fpga based efficient multiplier for image processing applications using recur...VLSICS Design
 
cis97007
cis97007cis97007
cis97007perfj
 
Ijarcet vol-2-issue-7-2357-2362
Ijarcet vol-2-issue-7-2357-2362Ijarcet vol-2-issue-7-2357-2362
Ijarcet vol-2-issue-7-2357-2362Editor IJARCET
 
Optimizing Data Encoding Technique For Dynamic Power Reduction In Network On ...
Optimizing Data Encoding Technique For Dynamic Power Reduction In Network On ...Optimizing Data Encoding Technique For Dynamic Power Reduction In Network On ...
Optimizing Data Encoding Technique For Dynamic Power Reduction In Network On ...IRJET Journal
 
Iaetsd low power flip flops for vlsi applications
Iaetsd low power flip flops for vlsi applicationsIaetsd low power flip flops for vlsi applications
Iaetsd low power flip flops for vlsi applicationsIaetsd Iaetsd
 
Review On 2:4 Decoder By Reversible Logic Gates For Low Power Consumption
Review On 2:4 Decoder By Reversible Logic Gates For Low Power ConsumptionReview On 2:4 Decoder By Reversible Logic Gates For Low Power Consumption
Review On 2:4 Decoder By Reversible Logic Gates For Low Power ConsumptionIRJET Journal
 
Design of Quaternary Logical Circuit Using Voltage and Current Mode Logic
Design of Quaternary Logical Circuit Using Voltage and Current Mode LogicDesign of Quaternary Logical Circuit Using Voltage and Current Mode Logic
Design of Quaternary Logical Circuit Using Voltage and Current Mode LogicVLSICS Design
 
ADS Lab 5 Report
ADS Lab 5 ReportADS Lab 5 Report
ADS Lab 5 ReportRiddhi Shah
 
Fpga based low power and high performance address
Fpga based low power and high performance addressFpga based low power and high performance address
Fpga based low power and high performance addresseSAT Publishing House
 
Fpga based low power and high performance address generator for wimax deinter...
Fpga based low power and high performance address generator for wimax deinter...Fpga based low power and high performance address generator for wimax deinter...
Fpga based low power and high performance address generator for wimax deinter...eSAT Journals
 
Analysis of signal transition
Analysis of signal transitionAnalysis of signal transition
Analysis of signal transitioncsandit
 

Tendances (19)

Final Project Report
Final Project ReportFinal Project Report
Final Project Report
 
Assignement 3 ADV report (1)
Assignement 3 ADV report (1)Assignement 3 ADV report (1)
Assignement 3 ADV report (1)
 
IRJET- Review Paper on Radix-2 DIT Fast Fourier Transform using Reversible Gate
IRJET- Review Paper on Radix-2 DIT Fast Fourier Transform using Reversible GateIRJET- Review Paper on Radix-2 DIT Fast Fourier Transform using Reversible Gate
IRJET- Review Paper on Radix-2 DIT Fast Fourier Transform using Reversible Gate
 
Multiple Valued Logic for Synthesis and Simulation of Digital Circuits
Multiple Valued Logic for Synthesis and Simulation of Digital CircuitsMultiple Valued Logic for Synthesis and Simulation of Digital Circuits
Multiple Valued Logic for Synthesis and Simulation of Digital Circuits
 
High Speed Signed multiplier for Digital Signal Processing Applications
High Speed Signed multiplier for Digital Signal Processing ApplicationsHigh Speed Signed multiplier for Digital Signal Processing Applications
High Speed Signed multiplier for Digital Signal Processing Applications
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Fpga based efficient multiplier for image processing applications using recur...
Fpga based efficient multiplier for image processing applications using recur...Fpga based efficient multiplier for image processing applications using recur...
Fpga based efficient multiplier for image processing applications using recur...
 
Research Paper
Research PaperResearch Paper
Research Paper
 
cis97007
cis97007cis97007
cis97007
 
Ijarcet vol-2-issue-7-2357-2362
Ijarcet vol-2-issue-7-2357-2362Ijarcet vol-2-issue-7-2357-2362
Ijarcet vol-2-issue-7-2357-2362
 
Optimizing Data Encoding Technique For Dynamic Power Reduction In Network On ...
Optimizing Data Encoding Technique For Dynamic Power Reduction In Network On ...Optimizing Data Encoding Technique For Dynamic Power Reduction In Network On ...
Optimizing Data Encoding Technique For Dynamic Power Reduction In Network On ...
 
Iaetsd low power flip flops for vlsi applications
Iaetsd low power flip flops for vlsi applicationsIaetsd low power flip flops for vlsi applications
Iaetsd low power flip flops for vlsi applications
 
Review On 2:4 Decoder By Reversible Logic Gates For Low Power Consumption
Review On 2:4 Decoder By Reversible Logic Gates For Low Power ConsumptionReview On 2:4 Decoder By Reversible Logic Gates For Low Power Consumption
Review On 2:4 Decoder By Reversible Logic Gates For Low Power Consumption
 
Design of Quaternary Logical Circuit Using Voltage and Current Mode Logic
Design of Quaternary Logical Circuit Using Voltage and Current Mode LogicDesign of Quaternary Logical Circuit Using Voltage and Current Mode Logic
Design of Quaternary Logical Circuit Using Voltage and Current Mode Logic
 
ADS Lab 5 Report
ADS Lab 5 ReportADS Lab 5 Report
ADS Lab 5 Report
 
Gn3311521155
Gn3311521155Gn3311521155
Gn3311521155
 
Fpga based low power and high performance address
Fpga based low power and high performance addressFpga based low power and high performance address
Fpga based low power and high performance address
 
Fpga based low power and high performance address generator for wimax deinter...
Fpga based low power and high performance address generator for wimax deinter...Fpga based low power and high performance address generator for wimax deinter...
Fpga based low power and high performance address generator for wimax deinter...
 
Analysis of signal transition
Analysis of signal transitionAnalysis of signal transition
Analysis of signal transition
 

Similaire à Research of 64-bits RISC Dual-core Microprocessor with High Performance and Low Power Consumption

High Speed Low-Power Viterbi Decoder Using Trellis Code Modulation
High Speed Low-Power Viterbi Decoder Using Trellis Code ModulationHigh Speed Low-Power Viterbi Decoder Using Trellis Code Modulation
High Speed Low-Power Viterbi Decoder Using Trellis Code ModulationMangaiK4
 
High Speed Low-Power Viterbi Decoder Using Trellis Code Modulation
High Speed Low-Power Viterbi Decoder Using Trellis Code ModulationHigh Speed Low-Power Viterbi Decoder Using Trellis Code Modulation
High Speed Low-Power Viterbi Decoder Using Trellis Code ModulationMangaiK4
 
Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...
Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...
Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...IRJET Journal
 
IRJET - Low Power M-Sequence Code Generator using LFSR for Body Sensor No...
IRJET -  	  Low Power M-Sequence Code Generator using LFSR for Body Sensor No...IRJET -  	  Low Power M-Sequence Code Generator using LFSR for Body Sensor No...
IRJET - Low Power M-Sequence Code Generator using LFSR for Body Sensor No...IRJET Journal
 
An Energy-Efficient Lut-Log-Bcjr Architecture Using Constant Log Bcjr Algorithm
An Energy-Efficient Lut-Log-Bcjr Architecture Using Constant Log Bcjr AlgorithmAn Energy-Efficient Lut-Log-Bcjr Architecture Using Constant Log Bcjr Algorithm
An Energy-Efficient Lut-Log-Bcjr Architecture Using Constant Log Bcjr AlgorithmIJERA Editor
 
Power Optimized Transmitter for Future Switched Network
Power Optimized Transmitter for Future Switched NetworkPower Optimized Transmitter for Future Switched Network
Power Optimized Transmitter for Future Switched NetworkIRJET Journal
 
EFFICIENT IMPLEMENTATION OF 16-BIT MULTIPLIER-ACCUMULATOR USING RADIX-2 MODIF...
EFFICIENT IMPLEMENTATION OF 16-BIT MULTIPLIER-ACCUMULATOR USING RADIX-2 MODIF...EFFICIENT IMPLEMENTATION OF 16-BIT MULTIPLIER-ACCUMULATOR USING RADIX-2 MODIF...
EFFICIENT IMPLEMENTATION OF 16-BIT MULTIPLIER-ACCUMULATOR USING RADIX-2 MODIF...VLSICS Design
 
IRJET-Error Detection and Correction using Turbo Codes
IRJET-Error Detection and Correction using Turbo CodesIRJET-Error Detection and Correction using Turbo Codes
IRJET-Error Detection and Correction using Turbo CodesIRJET Journal
 
A NOVEL APPROACH FOR LOWER POWER DESIGN IN TURBO CODING SYSTEM
A NOVEL APPROACH FOR LOWER POWER DESIGN IN TURBO CODING SYSTEMA NOVEL APPROACH FOR LOWER POWER DESIGN IN TURBO CODING SYSTEM
A NOVEL APPROACH FOR LOWER POWER DESIGN IN TURBO CODING SYSTEMVLSICS Design
 
Design and Implementation of 8 Bit Multiplier Using M.G.D.I. Technique
Design and Implementation of 8 Bit Multiplier Using M.G.D.I. TechniqueDesign and Implementation of 8 Bit Multiplier Using M.G.D.I. Technique
Design and Implementation of 8 Bit Multiplier Using M.G.D.I. TechniqueIJMER
 
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...IRJET Journal
 
A Configurable and Low Power Hard-Decision Viterbi Decoder in VLSI Architecture
A Configurable and Low Power Hard-Decision Viterbi Decoder in VLSI ArchitectureA Configurable and Low Power Hard-Decision Viterbi Decoder in VLSI Architecture
A Configurable and Low Power Hard-Decision Viterbi Decoder in VLSI ArchitectureIRJET Journal
 
Design & Simulation of RISC Processor using Hyper Pipelining Technique
Design & Simulation of RISC Processor using Hyper Pipelining TechniqueDesign & Simulation of RISC Processor using Hyper Pipelining Technique
Design & Simulation of RISC Processor using Hyper Pipelining TechniqueIOSR Journals
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
Investigating the Performance of NoC Using Hierarchical Routing Approach
Investigating the Performance of NoC Using Hierarchical Routing ApproachInvestigating the Performance of NoC Using Hierarchical Routing Approach
Investigating the Performance of NoC Using Hierarchical Routing ApproachIJERA Editor
 
Investigating the Performance of NoC Using Hierarchical Routing Approach
Investigating the Performance of NoC Using Hierarchical Routing ApproachInvestigating the Performance of NoC Using Hierarchical Routing Approach
Investigating the Performance of NoC Using Hierarchical Routing ApproachIJERA Editor
 
Low complexity turbo decoder with modified acs
Low complexity turbo decoder with modified acsLow complexity turbo decoder with modified acs
Low complexity turbo decoder with modified acsIAEME Publication
 

Similaire à Research of 64-bits RISC Dual-core Microprocessor with High Performance and Low Power Consumption (20)

High Speed Low-Power Viterbi Decoder Using Trellis Code Modulation
High Speed Low-Power Viterbi Decoder Using Trellis Code ModulationHigh Speed Low-Power Viterbi Decoder Using Trellis Code Modulation
High Speed Low-Power Viterbi Decoder Using Trellis Code Modulation
 
High Speed Low-Power Viterbi Decoder Using Trellis Code Modulation
High Speed Low-Power Viterbi Decoder Using Trellis Code ModulationHigh Speed Low-Power Viterbi Decoder Using Trellis Code Modulation
High Speed Low-Power Viterbi Decoder Using Trellis Code Modulation
 
Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...
Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...
Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...
 
Implementation of MAC using Modified Booth Algorithm
Implementation of MAC using Modified Booth AlgorithmImplementation of MAC using Modified Booth Algorithm
Implementation of MAC using Modified Booth Algorithm
 
IRJET - Low Power M-Sequence Code Generator using LFSR for Body Sensor No...
IRJET -  	  Low Power M-Sequence Code Generator using LFSR for Body Sensor No...IRJET -  	  Low Power M-Sequence Code Generator using LFSR for Body Sensor No...
IRJET - Low Power M-Sequence Code Generator using LFSR for Body Sensor No...
 
An Energy-Efficient Lut-Log-Bcjr Architecture Using Constant Log Bcjr Algorithm
An Energy-Efficient Lut-Log-Bcjr Architecture Using Constant Log Bcjr AlgorithmAn Energy-Efficient Lut-Log-Bcjr Architecture Using Constant Log Bcjr Algorithm
An Energy-Efficient Lut-Log-Bcjr Architecture Using Constant Log Bcjr Algorithm
 
Power Optimized Transmitter for Future Switched Network
Power Optimized Transmitter for Future Switched NetworkPower Optimized Transmitter for Future Switched Network
Power Optimized Transmitter for Future Switched Network
 
EFFICIENT IMPLEMENTATION OF 16-BIT MULTIPLIER-ACCUMULATOR USING RADIX-2 MODIF...
EFFICIENT IMPLEMENTATION OF 16-BIT MULTIPLIER-ACCUMULATOR USING RADIX-2 MODIF...EFFICIENT IMPLEMENTATION OF 16-BIT MULTIPLIER-ACCUMULATOR USING RADIX-2 MODIF...
EFFICIENT IMPLEMENTATION OF 16-BIT MULTIPLIER-ACCUMULATOR USING RADIX-2 MODIF...
 
IRJET-Error Detection and Correction using Turbo Codes
IRJET-Error Detection and Correction using Turbo CodesIRJET-Error Detection and Correction using Turbo Codes
IRJET-Error Detection and Correction using Turbo Codes
 
A NOVEL APPROACH FOR LOWER POWER DESIGN IN TURBO CODING SYSTEM
A NOVEL APPROACH FOR LOWER POWER DESIGN IN TURBO CODING SYSTEMA NOVEL APPROACH FOR LOWER POWER DESIGN IN TURBO CODING SYSTEM
A NOVEL APPROACH FOR LOWER POWER DESIGN IN TURBO CODING SYSTEM
 
Comparative study of single precision floating point division using differen...
Comparative study of single precision floating point division  using differen...Comparative study of single precision floating point division  using differen...
Comparative study of single precision floating point division using differen...
 
Design and Implementation of 8 Bit Multiplier Using M.G.D.I. Technique
Design and Implementation of 8 Bit Multiplier Using M.G.D.I. TechniqueDesign and Implementation of 8 Bit Multiplier Using M.G.D.I. Technique
Design and Implementation of 8 Bit Multiplier Using M.G.D.I. Technique
 
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
 
A Configurable and Low Power Hard-Decision Viterbi Decoder in VLSI Architecture
A Configurable and Low Power Hard-Decision Viterbi Decoder in VLSI ArchitectureA Configurable and Low Power Hard-Decision Viterbi Decoder in VLSI Architecture
A Configurable and Low Power Hard-Decision Viterbi Decoder in VLSI Architecture
 
Design & Simulation of RISC Processor using Hyper Pipelining Technique
Design & Simulation of RISC Processor using Hyper Pipelining TechniqueDesign & Simulation of RISC Processor using Hyper Pipelining Technique
Design & Simulation of RISC Processor using Hyper Pipelining Technique
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Investigating the Performance of NoC Using Hierarchical Routing Approach
Investigating the Performance of NoC Using Hierarchical Routing ApproachInvestigating the Performance of NoC Using Hierarchical Routing Approach
Investigating the Performance of NoC Using Hierarchical Routing Approach
 
Investigating the Performance of NoC Using Hierarchical Routing Approach
Investigating the Performance of NoC Using Hierarchical Routing ApproachInvestigating the Performance of NoC Using Hierarchical Routing Approach
Investigating the Performance of NoC Using Hierarchical Routing Approach
 
Low complexity turbo decoder with modified acs
Low complexity turbo decoder with modified acsLow complexity turbo decoder with modified acs
Low complexity turbo decoder with modified acs
 

Plus de TELKOMNIKA JOURNAL

Amazon products reviews classification based on machine learning, deep learni...
Amazon products reviews classification based on machine learning, deep learni...Amazon products reviews classification based on machine learning, deep learni...
Amazon products reviews classification based on machine learning, deep learni...TELKOMNIKA JOURNAL
 
Design, simulation, and analysis of microstrip patch antenna for wireless app...
Design, simulation, and analysis of microstrip patch antenna for wireless app...Design, simulation, and analysis of microstrip patch antenna for wireless app...
Design, simulation, and analysis of microstrip patch antenna for wireless app...TELKOMNIKA JOURNAL
 
Design and simulation an optimal enhanced PI controller for congestion avoida...
Design and simulation an optimal enhanced PI controller for congestion avoida...Design and simulation an optimal enhanced PI controller for congestion avoida...
Design and simulation an optimal enhanced PI controller for congestion avoida...TELKOMNIKA JOURNAL
 
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...TELKOMNIKA JOURNAL
 
Conceptual model of internet banking adoption with perceived risk and trust f...
Conceptual model of internet banking adoption with perceived risk and trust f...Conceptual model of internet banking adoption with perceived risk and trust f...
Conceptual model of internet banking adoption with perceived risk and trust f...TELKOMNIKA JOURNAL
 
Efficient combined fuzzy logic and LMS algorithm for smart antenna
Efficient combined fuzzy logic and LMS algorithm for smart antennaEfficient combined fuzzy logic and LMS algorithm for smart antenna
Efficient combined fuzzy logic and LMS algorithm for smart antennaTELKOMNIKA JOURNAL
 
Design and implementation of a LoRa-based system for warning of forest fire
Design and implementation of a LoRa-based system for warning of forest fireDesign and implementation of a LoRa-based system for warning of forest fire
Design and implementation of a LoRa-based system for warning of forest fireTELKOMNIKA JOURNAL
 
Wavelet-based sensing technique in cognitive radio network
Wavelet-based sensing technique in cognitive radio networkWavelet-based sensing technique in cognitive radio network
Wavelet-based sensing technique in cognitive radio networkTELKOMNIKA JOURNAL
 
A novel compact dual-band bandstop filter with enhanced rejection bands
A novel compact dual-band bandstop filter with enhanced rejection bandsA novel compact dual-band bandstop filter with enhanced rejection bands
A novel compact dual-band bandstop filter with enhanced rejection bandsTELKOMNIKA JOURNAL
 
Deep learning approach to DDoS attack with imbalanced data at the application...
Deep learning approach to DDoS attack with imbalanced data at the application...Deep learning approach to DDoS attack with imbalanced data at the application...
Deep learning approach to DDoS attack with imbalanced data at the application...TELKOMNIKA JOURNAL
 
Brief note on match and miss-match uncertainties
Brief note on match and miss-match uncertaintiesBrief note on match and miss-match uncertainties
Brief note on match and miss-match uncertaintiesTELKOMNIKA JOURNAL
 
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...TELKOMNIKA JOURNAL
 
Evaluation of the weighted-overlap add model with massive MIMO in a 5G system
Evaluation of the weighted-overlap add model with massive MIMO in a 5G systemEvaluation of the weighted-overlap add model with massive MIMO in a 5G system
Evaluation of the weighted-overlap add model with massive MIMO in a 5G systemTELKOMNIKA JOURNAL
 
Reflector antenna design in different frequencies using frequency selective s...
Reflector antenna design in different frequencies using frequency selective s...Reflector antenna design in different frequencies using frequency selective s...
Reflector antenna design in different frequencies using frequency selective s...TELKOMNIKA JOURNAL
 
Reagentless iron detection in water based on unclad fiber optical sensor
Reagentless iron detection in water based on unclad fiber optical sensorReagentless iron detection in water based on unclad fiber optical sensor
Reagentless iron detection in water based on unclad fiber optical sensorTELKOMNIKA JOURNAL
 
Impact of CuS counter electrode calcination temperature on quantum dot sensit...
Impact of CuS counter electrode calcination temperature on quantum dot sensit...Impact of CuS counter electrode calcination temperature on quantum dot sensit...
Impact of CuS counter electrode calcination temperature on quantum dot sensit...TELKOMNIKA JOURNAL
 
A progressive learning for structural tolerance online sequential extreme lea...
A progressive learning for structural tolerance online sequential extreme lea...A progressive learning for structural tolerance online sequential extreme lea...
A progressive learning for structural tolerance online sequential extreme lea...TELKOMNIKA JOURNAL
 
Electroencephalography-based brain-computer interface using neural networks
Electroencephalography-based brain-computer interface using neural networksElectroencephalography-based brain-computer interface using neural networks
Electroencephalography-based brain-computer interface using neural networksTELKOMNIKA JOURNAL
 
Adaptive segmentation algorithm based on level set model in medical imaging
Adaptive segmentation algorithm based on level set model in medical imagingAdaptive segmentation algorithm based on level set model in medical imaging
Adaptive segmentation algorithm based on level set model in medical imagingTELKOMNIKA JOURNAL
 
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...TELKOMNIKA JOURNAL
 

Plus de TELKOMNIKA JOURNAL (20)

Amazon products reviews classification based on machine learning, deep learni...
Amazon products reviews classification based on machine learning, deep learni...Amazon products reviews classification based on machine learning, deep learni...
Amazon products reviews classification based on machine learning, deep learni...
 
Design, simulation, and analysis of microstrip patch antenna for wireless app...
Design, simulation, and analysis of microstrip patch antenna for wireless app...Design, simulation, and analysis of microstrip patch antenna for wireless app...
Design, simulation, and analysis of microstrip patch antenna for wireless app...
 
Design and simulation an optimal enhanced PI controller for congestion avoida...
Design and simulation an optimal enhanced PI controller for congestion avoida...Design and simulation an optimal enhanced PI controller for congestion avoida...
Design and simulation an optimal enhanced PI controller for congestion avoida...
 
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
 
Conceptual model of internet banking adoption with perceived risk and trust f...
Conceptual model of internet banking adoption with perceived risk and trust f...Conceptual model of internet banking adoption with perceived risk and trust f...
Conceptual model of internet banking adoption with perceived risk and trust f...
 
Efficient combined fuzzy logic and LMS algorithm for smart antenna
Efficient combined fuzzy logic and LMS algorithm for smart antennaEfficient combined fuzzy logic and LMS algorithm for smart antenna
Efficient combined fuzzy logic and LMS algorithm for smart antenna
 
Design and implementation of a LoRa-based system for warning of forest fire
Design and implementation of a LoRa-based system for warning of forest fireDesign and implementation of a LoRa-based system for warning of forest fire
Design and implementation of a LoRa-based system for warning of forest fire
 
Wavelet-based sensing technique in cognitive radio network
Wavelet-based sensing technique in cognitive radio networkWavelet-based sensing technique in cognitive radio network
Wavelet-based sensing technique in cognitive radio network
 
A novel compact dual-band bandstop filter with enhanced rejection bands
A novel compact dual-band bandstop filter with enhanced rejection bandsA novel compact dual-band bandstop filter with enhanced rejection bands
A novel compact dual-band bandstop filter with enhanced rejection bands
 
Deep learning approach to DDoS attack with imbalanced data at the application...
Deep learning approach to DDoS attack with imbalanced data at the application...Deep learning approach to DDoS attack with imbalanced data at the application...
Deep learning approach to DDoS attack with imbalanced data at the application...
 
Brief note on match and miss-match uncertainties
Brief note on match and miss-match uncertaintiesBrief note on match and miss-match uncertainties
Brief note on match and miss-match uncertainties
 
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
 
Evaluation of the weighted-overlap add model with massive MIMO in a 5G system
Evaluation of the weighted-overlap add model with massive MIMO in a 5G systemEvaluation of the weighted-overlap add model with massive MIMO in a 5G system
Evaluation of the weighted-overlap add model with massive MIMO in a 5G system
 
Reflector antenna design in different frequencies using frequency selective s...
Reflector antenna design in different frequencies using frequency selective s...Reflector antenna design in different frequencies using frequency selective s...
Reflector antenna design in different frequencies using frequency selective s...
 
Reagentless iron detection in water based on unclad fiber optical sensor
Reagentless iron detection in water based on unclad fiber optical sensorReagentless iron detection in water based on unclad fiber optical sensor
Reagentless iron detection in water based on unclad fiber optical sensor
 
Impact of CuS counter electrode calcination temperature on quantum dot sensit...
Impact of CuS counter electrode calcination temperature on quantum dot sensit...Impact of CuS counter electrode calcination temperature on quantum dot sensit...
Impact of CuS counter electrode calcination temperature on quantum dot sensit...
 
A progressive learning for structural tolerance online sequential extreme lea...
A progressive learning for structural tolerance online sequential extreme lea...A progressive learning for structural tolerance online sequential extreme lea...
A progressive learning for structural tolerance online sequential extreme lea...
 
Electroencephalography-based brain-computer interface using neural networks
Electroencephalography-based brain-computer interface using neural networksElectroencephalography-based brain-computer interface using neural networks
Electroencephalography-based brain-computer interface using neural networks
 
Adaptive segmentation algorithm based on level set model in medical imaging
Adaptive segmentation algorithm based on level set model in medical imagingAdaptive segmentation algorithm based on level set model in medical imaging
Adaptive segmentation algorithm based on level set model in medical imaging
 
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
 

Dernier

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesPrabhanshu Chaturvedi
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 

Dernier (20)

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 

Research of 64-bits RISC Dual-core Microprocessor with High Performance and Low Power Consumption

  • 1. TELKOMNIKA, Vol.16, No.2, April 2018, pp. 463~470 ISSN: 1693-6930, accredited A by DIKTI, Decree No: 58/DIKTI/Kep/2013 DOI: 10.12928/TELKOMNIKA.v16i2.4153 ◼ 463 Received November 5, 2016; Revised December 19, 2016; Accepted March 12, 2018 Research of 64-bits RISC Dual-core Microprocessor with High Performance and Low Power Consumption Gang Zou*, Zhibiao Shao, Linghao Li Electron and Information Engineering College, Xi’an Jiaotong University No. 28, Xianning West Road, Xi'an, Shaanxi, 710049, P.R. China *Corresponding author, e-mail: 99887406@qq.com Abstract A 64-bits RISC Dual-Core microprocessor with high performance and low power consumption is presented in this paper. The processor has a symmetric architecture with two cores. Each of them has three stage pipeline, 64-bit data-path and 64-bit address port. A novel shared register module, redundant Booth3 algorithm and leapfrog Wallace tree architecture are introduced to the microprocessor, and both the performance and power consumption of it has been improved enormously. As the FPGA simulation result indicates, the power consumption is decreased by 14% and the longest data-path is shortened by 25%. Keywords: Dual-core, Booth algorithm, Wallace tree Copyright © 2018 Universitas Ahmad Dahlan. All rights reserved. 1. Introduction According to the microprocessor processing ability,instruction and data length of microprocessor gradually growth from 32bits to 64bits to adapt to the growing computational power requirements [1]. According to the microprocessor overall architecture, on the one hand, RISC Reduced Instruction Set Computing) became the main trend of microprocessor overall architecture by replacing CISC Complex Instruction Set Computer) because of its concise instruction system and achieve efficiency. On the other hand, multi-core microprocessor became the focus of microprocessor research and commercial areas by replacing single core to avoid power consumption and problems caused by line width tend to ultimate limit and enhance of integration level. In general, the design of microprocessor tend to long word length, Reduced Instruction Set and single chip multi-core. This paper describes a design of 64-bits RISC Dual-Core microprocessor. A simple and effective architecture and control mechanism of Dual-Core was achieved by using a novel shared register module. A high processing speed was achieved by using a redundant Booth3 algorithm. The Dynamic power consumption was decreased by using leapfrog Wallace tree architecture. This microprocessor satisfies the needs of high performance and low power consumption application. 2. Architec Ture of Dual-core The structure of symmetric Dual-Core microprocessor designed in this paper is shown in Figure 1. And the structure of single-core in chip is shown in Figure 2. Resource sharing and data exchanging was effective achieved by using shared register module. Sequences of instructions have fixed order for Single task program, according the order of reading and writing a same register, so there are four kinds of data race as shown after (Assuming there are two instructions named i and j, and instruction i fetching before instruction j r).
  • 2. ◼ ISSN: 1693-6930 TELKOMNIKA Vol. 16, No. 2, April 2018 : 463 – 470 464 Figure 1. Architecture of dual-core Correlation of RAW (Read After Write). Instruction j read source register Rx but instruction i does not write result in destination register Rx. Therefore instruction j get a expired operand which is a wrong operand. DecodingUnit& ControllingUnit Multiplexer & Address Latch Address Adder Output Enable AddressPort 64bits PC Register Registers Division Register BusPreselection Unit Arithmetic Logic Multipler Field Compression BusArbitrationUnit DataOutputPort64bits Barrel Shifter JtagUnit Instruction Register Multiple xer Instruction Cache Field Extraction Field Expansion Data Output Latch DataInputPort64bits NM INT CLK RST OE WE C BUS B BUSA BUS Temporary Storage of Multiplexer Data Figure 2. Architecture of single-core Correlation of WAW (Write After Write). Instruction i and instruction j write a same destination register Rx but the writing operating of instruction j is earlier or at least the same time than instruction i. Therefore we have a wrong order of writing operation which lead to that the value of destination register Rx become a indeterminate state or come from instruction i instead of instruction j. Correlation of WAR (Write After Read). Instruction j write result in destination register Rx before instruction i read source register Rx. Therefore instruction i get a new operand which pipeline controlling Core I Core II Shared Reg Stack I/O Port Shared Cache In Chip
  • 3. TELKOMNIKA ISSN: 1693-6930 ◼ Research of 64-bits RISC Dual-Core Microprocessor with High Performance... (Gang Zou) 465 is a wrong operand. Correlation of RAR (Read After Read). Instruction i and Instruction j read from a same source register Rx. Apparently, this situation does not bring to data race. Figure 3. RAW correlation As shown above, all four kinds of correlations are caused by some operation on a same register. It is obviously that RAR correlation does not lead to data race. Fortunately, we don’t need to handle WAR correlation. This Dual-Core has a sequential instruction issue strategy, and each pipeline of two single-core read source operand in instruction decoding level and write destination operand in execution level. This means reading source operand occur definitely before writing destination operand and naturally does not lead to WAR correlation. The instruction execution order of this Dual-Core is the same with program order because we use sequential instruction issue strategy. Therefore, we can avoid WAW correlation just by writing the new result of instruction j in destination register when the two adjacent instructions write the same register. So the correlation we need to handle is only RAW. Solution of RAW data race is shown as behind. There are two kinds of RAW correlation according to the right value that will write in the register may be produced or not when instruction j read destination operand in decoding level. If the right value has been produced, there is no correlation between dual-core because both of single core does not come in execution level. Therefore we just need to handle with RAW correlation that the right value does not be produced when instruction j read destination operand in instruction decoding level. As shown in Figure 3, assuming that there is RAW correlation between instruction M and M+1 which means the destination operand of instruction M is the source operand of instruction M+1. In the first period, core I get instruction M in fetching level, and at the same time core II get instruction M+1. In the second period, core I and core II get instruction M+2 and M+3 in fetching level, and at the same time the RAW correlation be detected in decoding level. At this time, we can clear instruction register in negative edge to stop decoding of instruction M+2 and M+3. In the third period, core I get instruction M+1 in fetching level, and at the same time core II get instruction M+2. When instruction M+1 and M+2 was decoding, instruction M has been executing. That is to say, correlation was handled because destination operand has been produced. 3. Design of High-speed Algorithm and Low-power Architecture Multiplier is one of the most important parts of this chip, and lies within the critical path. Therefore, to a great extent, it is the key element of the whole Dual-Core system performance. The Booth algorithm is a popular way to reduce the number of the partial products by recoding the multiplier, while the Wallace tree architecture is an efficient method to compress the partial products with short carry-in delay. Both of them are widely applied to improve the performance of the multiplier, such as the speed, the power consumption and etc. However, the traditional Booth algorithm has to process the tripling-partial product, which increase the critical path so that it decelerates the multiplier. While the traditional Wallace tree architecture could not Instruction Fetching Instruction Decoding Instruction Executing Instruction Fetching X X Instruction Fetching X X Instruction Fetching Instruction Decoding Instruction Executing Instruction Fetching Instruction Decoding Instruction Executing Instruction Fetching Instruction Decoding Instruction Executing Instruction M, Core I Instruction M+1, Core II Instruction M+2, Core I Instruction M+3, Core II Instruction M+1, Core I Instruction M+2, Core II
  • 4. ◼ ISSN: 1693-6930 TELKOMNIKA Vol. 16, No. 2, April 2018 : 463 – 470 466 generate the carry and the pseudo sum synchronously, which takes unnecessary 0-1 jumps and redundant dynamic power consumption. This paper presents a novel design of the Wallace binary multiplier. It proposed a redundant Booth3 algorithm to avoid the difficulty of generating the tripling-partial product, while it also presents a novel leapfrog Wallace tree architecture to generate the carry and the pseudo sum synchronously, which puts an end to the unnecessary 0-1jumps and improves the power consumption of the multiplier. Such improvements are used in multiplier for testing and simulation. The simulation results show that the improvements are effective to improve the performance and decrease the power consumption of multiplier. 3.1. A Redundant Booth3 Multiplier for High Speed A redundant Booth3 algorithm is studied in this paper to improve multiplier’s speed, because it is the important way to improve the system performance. For the Booth3 algorithm, an n-bits binary data X is recoded by the blocks with 4 bits scanning digit xi+2xi+1xixi-1, which is based on the value of (-4xi+2+xi+1+xi+xi-1). Ignoring the superposed bits between the blocks, the recoding operation uses every three bits of multiplier, instead of every two of them in the Booth2 algorithm, to generate one partial product in order to reduce the number of partial product and increase the speed. For the 64-bits multiplier, a zero bit is posed after the least significant bit, while a sign bit is posed before the most significant bit respectively. Then, the binary digit is recoded by the blocks and generates the partial product. The partial product should be selected in the order {0, M, 2M, 3M}, where the M stands for the multiplicand. Every partial product should shift 3 bits left or right than the one before it. Whether left or right is decided by the recoding sequence: the big-endian sequence makes the partial product to shift right, while the little-endian sequence makes the partial product to shift left. For the binary multiplication algorithm, the 0M and the 1M can be gotten directly, while the 2M and 4M can be gotten by shifting the multiplicand one bit or two bits left respectively. But the situation of 3M is not as easy as others. It could not be calculated directly as 2M+M for the long delay of a carry-in adder, especially while the multiplicand has a long bit-width. And it could not be calculated by shifting either. The 64-bits multiplier that proposed in this chip chooses the little-endian sequence, and the number of partial product reduces to 22 than 33. For the partial product of 3M, an adder composed with 4-bits adders that operate in parallel is adopted, and the carry-in signal does not transfer among the adders but forms another partial product. There are 8 carry-in signals for each partial product, and 22 partial products have 176 carry-in signals. For every certain weight, there are 6 carry-in signals at most, therefore, all of the 176 carry-in signal can be compressed into 6 partial products, which modifies the carry-in delay for the 3M generation to the 4-bits chain carry-in delay instead of a 64-bits long chain carry-in delay. However, that also introduces 6 more partial products. Its infection will be discussed in the next paragraph. For the algorithm of the complement code, every partial product should extend the sign bit to the most significant bit increases the operating data amount. To avoid the extension, a method that a certain digit was introduced as another partial product which is called the counteractive digit to counteract the missing sign bits was applied. Figure 4 shows the operating data amount for the shift-add algorithm, where there are 64+65+. . . +126+127=6112 bits data in the Figure 4. The Booth2 algorithm has to deal with 67+69+. . . +129+131=3267 bits data, as it is shown in Figure 5. Figure 6 betrays the operating data amount as 66×22+128+176=1756 bits for the method mentioned in this paper. All of the three figures are based on 64-bits multiplier. Obviously, the third method, bases on the Booth3 algorithm, has much less data amount to operate, which is good for realizing a 64-bits high speed multiplier.
  • 5. TELKOMNIKA ISSN: 1693-6930 ◼ Research of 64-bits RISC Dual-Core Microprocessor with High Performance... (Gang Zou) 467 Figure 4. The complement code algorithm Figure 5. The booth2 algorithm Figure 6. The modified booth3 algorithm The synthesis results show that the modified Booth3 algorithm accelerates 23% compared with the traditional Booth2 algorithm, both of which are synthesized in 0.18um CMOS techniques and 50MHz. 3.2. A Leapfrog Wallace Architecture for Low Power The power consumption of the integrated circuit are mainly from the charging and discharging current of load capacitance, which is the dynamic power P=1/2CV2ESWFCLK As shown above, C indicates load capacitance; V indicates power supply voltage; ESW indicates Jump frequency; FCLK indicates service frequency. Power consumption can be reduced by decreasing the number of jumping logic cells with the same chip technology power supply voltage and service frequency. Wallace tree is theoretically the fastest adder tree for multiplications. However, the carry c and pseudo sum s has different generation time so that the carry which has faster generation speed must wait for the generation of the pseudo sum. As it is depicted in Figure 7, the pseudo sum s has 6 tds, while the carry c has 4 tds if the td stands for a complex logic gate delay. The carry should waits 2 tds for the pseudo sum generation in the traditional Wallace tree. It delays the compression speed, and takes unnecessary 0-1 jumps which increases the power consumption of the compressor. Furthermore, the irregularity of the Wallace tree architecture increases the wiring delay and area. sign extension partial products 126 126 63 63 0 64 partial products sign extension partial products 130 130 65 64 0 33 partial products counteractive digit carry-in signal carry-in signal carry-in signal carry-in signal carry-in signal carry-in signal 22 partial products partial products 065 63128
  • 6. ◼ ISSN: 1693-6930 TELKOMNIKA Vol. 16, No. 2, April 2018 : 463 – 470 468 Figure 7. The traditional 4-2 compressor A new Wallace tree architecture is presented in this paper to resolve the disadvantages presented above of the traditional one, which is named as the leapfrog Wallace tree and shown in Figure 8-B. It generates the carry and pseudo sum synchronously by the leapfrog connection, which increases the compression speed of the partial products and avoids the unnecessary 0-1 jumps of the traditional one to decrease the instantaneous power consumption. The traditional Wallace tree architecture is depicted in Figure 8-A. Comparing with it, the leapfrog Wallace tree has 2 tds advantage in the critical path delay if the wiring delay is not considered. However, the wiring delay takes most parts of the delay in deep submicron CMOS techniques. The architecture of leapfrog Wallace tree has much more regularity and shorter wires. Therefore, it takes shorter wiring delay and lower instantaneous power. Comparing with the traditional Wallace tree, the synthesis results show that the leapfrog Wallace tree accelerates 26% and lowers 20% power consumption, and decreases 13% area, both of which are synthesized in 0.18 CMOS techniques and 100Mhz. A. Traditional Wallace Tree B. Salutatory Wallace Tree Figure 8. The architecture comparision of the Wallace Trees Figure 9 and Figure 10 show the differences of the instantaneous power consumption between the two architectures of the Wallace trees. Both of the results are generated by the same test-vectors. It is obviously that the leapfrog Wallace tree has only one peak in the instantaneous power picture while the traditional one has almost three. That means the leapfrog Wallace tree consumes less instantaneous power and is much more power-effective than the traditional one when both of them works under the same situation. The simulation results accords perfectly with the result of the previous theoretical analysis that the leapfrog Wallace tree architecture avoids the unnecessary 0-1 jumps to improve the instantaneous power efficiency.
  • 7. TELKOMNIKA ISSN: 1693-6930 ◼ Research of 64-bits RISC Dual-Core Microprocessor with High Performance... (Gang Zou) 469 Figure 9. Instantaneous power of traditional Wallace Tree Figure 10. Instantaneous power of leapfrog Wallace Tree 4. Synthesis and Simulation 4.1. Synthesis with the Synopsys Design Compiler Both of the previous 64-bits Dual-Core microprocessor and the new 64-bits Dual-Core microprocessor with the novel shared register model, the novel Booth multiplier and the novel Wallace tree architecture was synthesized with the 0.18um CMOS library by the Synopsys Design Compiler. The synthesis results show that the new Dual-Core microprocessor has about 9.637ns worst case critical path delay under the 0.18μm CMOS technology, while the previous design take almost 13.187ns worst case critical path delay under the same CMOS library. It takes about 26.9% advantage in speed than the previous design. The power consumption of the new design that reported by the Design Compiler is 1204.38mw at 50Mhz, while the power-consumption of the previous design is 1400.44mw at 50Mhz. Therefore, the new 64-bits Dual-Core microprocessor with the modified Booth3 algorithm and completely parallel Wallace tree structure saves about 14% power consumption at 50Mhz by using the same 0.18um CMOS technology than the previous design based on Booth2 algorithm and traditional Wallace Tree. 4.2. FPGA Simulation Result The 64-bits Dual-Core microprocessor is simulated on the Altera Stratix III EP3SL150F780C4N FPGA device. The Quartus II 12.1sp1 was used to generate the simulation result. The power reports show that the new 64-bits Dual-Core microprocessor with the novel Booth multiplier and the novel Wallace tree architecture is only 1269.48mW when it works at 50MHz. In the other hand, the previous 64-bits Dual-Core microprocessor is 1447.72mW when it also works at 50MHz. The new improvements make the 64-bits Dual-Core microprocessor to save almost 14% power consumption at 50MHz than the previous design, and this result coincides with the result of synthesizing by Synopsys Design Compiler. The FPGA simulation also generates the maximum pad to pad delay. The maximum pad to pad delay for the new 64-bits Dual-Core microprocessor is 14.751ns, while the maximum pad to pad delay for the previous design is almost 19.574ns. The new 64-bits Dual-Core microprocessor takes almost 25% advantages in speed than the previous design. Its frequency is up to 66.6Mhz on the EP3SL150F780C4N FPGA device. 5. Conclusion A 64-bits Dual-Core microprocessor is proposed in this paper. Its architecture is based on the novel shared register model. Its performance is improved by modifying Booth3 algorithm and its power consumption is optimized by completely parallel Wallace tree structure. The
  • 8. ◼ ISSN: 1693-6930 TELKOMNIKA Vol. 16, No. 2, April 2018 : 463 – 470 470 simulation results indicate that the power consumption is decreased by 14% and the longest data-path is shortened by 25% compared with the previous design. References [1] M Schoeberl, S Abbaspour, B Akesson, N Audsley, R Capasso. T-CREST: Time-predictable multi- core architecture for embedded systems. Journal of Systems Architecture. 2015, 61(9): 449-471. [2] S Pagani, JJ Chen, M Li. Energy Efficiency on Multi-Core Architectures with Multiple Voltage Islands. IEEE Transactions on Parallel & Distributed Systems. 2015, 26(6): 1-1. [3] T Pimpalkhute, S Pasricha. NoC Scheduling for Improved Application-Aware and Memory-Aware Transfers in Multi-core Systems. Proceedings of the 2014 27th International Conference on VLSI Design and 2014 13th International Conference on Embedded Systems. 2015, 26(6). [4] Cheng YL, Min CH, Rong GC. Instruction scheduling and transformation for a VLIW unified reduced instruction set computer/digital signal processor processor with shared register architecture. Concurrency and Computation: Practice and Experience. 2014; 26(1): 134–151. [5] J Zhanga, S Youb, L Gruenwaldc. Parallel online spatial and temporal aggregations on multi-core CPUs and many-core GPUs. Information Systems. 2014; 44: 134–154. [6] Z Yang, FP Wu, JR Dong, RD Heng. Optimization of Power System Scheduling Based on Shuffled Complex Evolution Metropolis Algorithm. TELKOMNIKA (Telecommunication Computing Electronics and Control). 2015; 13(2): 413-420. [7] Ravi N, Subbaiah Y, Prasad TJ, et al. A novel low power, low area array multiplier design for DSP applications. Signal Processing, Communication, Computing and Networking Technologies (ICSCCN), 2011 International Conference on. IEEE. 2011: 254-257. [8] Sivanantham, S Padmavathy, M Divyanga, S Lincy, PV Anitha. System-On-a-Chip Test Data Compression and Decompression with Reconfigurable Serial Multiplier. International Journal of Engineering & Technology. 2013; 5(2): 973. [9] SK Chen, CW Liu, TY Wu. Design and Implementation of High-Speed and Energy-Efficient Variable- Latency Speculating Booth Multiplier (VLSBM). Circuits and Systems I: Regular Papers, IEEE Transactions on. 2013; 60(10). [10] AZ Jidin, T Sutikno. FPGA Implementation of Low-Area Square Root Calculator. TELKOMNIKA (Telecommunication Computing Electronics and Control). 2015; 13(4): 1145-1152. [11] A Sathya, S Fathimabee, S Divya. Parallel multiplier-accumulator based on radix-2 modified Booth algorithm by using a VLSI architecture. Electronics and Communication Systems (ICECS), 2014 International Conference on 13-14 Feb. 2014.