SlideShare une entreprise Scribd logo
1  sur  4
A Hybrid Radix-4/Radix-8 LOWPower, High
Speed Multiplier Architecture for Wide Bit Widths
Brian S. Cherkauerl and Eby G. Friedrnan2
1Intel Corporation
2200 Mission College Blvd.
Santa Clara California 95052
Abstract – A hybrid radix-4/radix-8 architecture targeted for
high bit multipliers is presented as a compromise between
the high speed of a radix-4 multiplier architecture and the
low power dissipation of a radix-8 multiplier architecture. In
this hybrid radix-4/radix-8 multiplier architecture, the perfor-
mance bottleneck of a radix-8 multiplier, the generation of
three times the multiplicand for use in generating the radix-8
partial product, is performed in parallel with the reduction
of the radix+ partird products rather than serially, as in a
radix-8 multiplier. This hybrid radix+radix-8 multiplier ar-
chitecture requires 13% less power for a 64 x 64 bit multi-
plier, and results in only a 9% increase in delay, as com-
“pared with a radix~ implementation, When supply voltage
is sealed such that all multipliers exhibit the same delay, the
64 x 64 bit hybrid radixJVradix-8 multiplier dissipates less
power than either the radix-4 or radix-8 multipliers. The hy-
brid radix-4/radix-8 amhiteeture is therefore appropriate for
those applications that must dissipate minimal power and
operate at high speeds.
I. Introduction
High speed multipliers are fundamental elements in sig-
nal processing and arithmetic based systems. The higher bit
widths required of modem multipliers provide the opportu-
nity to explore new architectures which would be impmcticrd
for smaller bit widfh multiplication. Architectures for circuit
elements historically were designed to operate at maximum
speed, notwithstanding the resulting power dissipation. Re-
eently, gmter emphasis has been placed on reducing the
power dissipation of important circuit functions while main-
taining these high speeds. Therefore, power dissipation as
well as circuit speed should be considered at the architec-
tural level.
A leveling off of the power factor, the power dissipated
per bit2.Hz, and henee the power efficiency, has recently
been observed [1]. This leveling of the power factor is
illustrated in Figure 1. This trend leads to the conclusion
that to fitrt.her improve the power efficiency of multiplied,
power dissipation must be addressed at the architectural level
as well as at the circuit level.
The data in Figure 1 present the power factors for a
number of recent multiplier implementations. Sharma er
al. utilized Booth radix-4 eneoding along with a reduction
array of carry save adders (CSAS) generated by a recursive
algorithm to produce the 16 x 16-bit multiplier in [2]. In [3],
Yano ef al. introduced the complementary pass-transistor
logic family (CPL) and implemented a 16x 16-bit multiplier
in CPL which used no encoding but did use a Wallace tree for
partial product reduction. Nagamatsu ef al. presented a 32 x
32-bit multiplier in which Booth radix-4 was used to generate
the partial products and a tree of 4:2 counters was used to
reduee these partial products [4]. Mori ef d. designed a 54
x 54-bit multiplier similar in structure to that of [4], also
utilizing Booth radix~ and 42 counters [5]. In [6], Goto
This research was supported in part by the Natimud Science Foun-
dation under Grant No. MIP-920S165 and Grant No. MIP-94238S6, the
Army Research Oftiee under Grant No. DAAH04-93-G-0323, and by a
grant from the Xerox Corporation.
2Department of Electrical Engineering
University of Rochester
Rochester, New York 14627
400,
pw
bit2*Hz :L
3s0 (1)
2s0
150
100
t
+ (2)
50
0
19$9 1990 1991 199a 1992 1994 1995
Year
Figure 1. Multiplierpower factor
et al. presented a 54 x 54-bit multiplier with Booth radix-4
partird product generation, but used a regularly structured
tree for partiat product reduction, thereby simplifying the
physiezd layout. Lu and Samueli were most concerned
with throughput in the design of the multiplier-accumulator
described in [7], and thus they presented a 13-stage, deeply
pipelined 12 x 12-bit multiplier-accumulator which used
no encoding and was implemented with a quasi-domino
dynamic logic family. The data point representing this work
is a 64 x 64-bit multiplier using both Booth mlix-4 and
miix-8 encoding with a Dad& reduction tree.
In this paper a hybrid Bodt radix4/radix-8 multiplier
architecture is presented as a method to trade-off speed and
power dissipation in two’s complement signed multipliers.
The improved speed and power dissipation characteristics
of this new multiplier architecture are compared with that
of standard radix~ and radix-8 based multipliers. The hy-
brid radix-4/radix-8 architecture presented in this paper is
described in Section II. The speed and power dissipation
characteristics of the three multiplier architectures are com-
pared in Section III. Finally, some conclusions are drawn in
Section IV.
II. Hybrid Radix Architecture (Radix-4/Radix-8)
The proposed hybrid radix-4/mdix-8 multiplier archi-
tecture uses a combination of modified Booth radix-4 and
radix-8 encoding [8–10]. The hybrid radix-4/radix-8 archi-
tecture mitigates the delay penalty associated with the gen-
eration of 3B (see Figure 2) for radix-8 encoding by using
the additional parallelism of the radix-4 encoding/reduction.
In this manner the hybrid radix-4/radix-8 multiplier com-
bines the speed advantage of the radix-4 multiplier with the
reduced power dissipation of the radix-8 multiplier.
In a radix-8 architecture, the multiplication proee-ss
is serially dependent upon the time required to generate
3B: while 3B is being generated by a high speed adder,
no other processing can take place within the multiplier.
This requirement to generate 3B leads to a significant delay
penalty, on the order of 1O-2O%,as compared with a radix-4
architecture (where the partial products may be generated by
simple shifting and/or complementing) [11].
In the hybrid radix-4/radix-8 architecture, a subset of
the partial products are generated using radix-4 modified
Copyright (C) 1996. All Rights Reserved.
CFigure 2. Hybridradix-4/radix-8multiplierimchitecture
Booth encoding. Reduction begins on these radix-4 par-
tial products while 3B is simultaneously being generated
by a high speed adder. Upon generating 3B, the remain-
ing partial products are generated using radix-8 encoding,
and these partial products are subsequently included within
the reduction tree. A Wallace/Dadda structure is assumed
for the reduction tree [12,13]. In this manner, some reduc-
tion of the partial products takes place while the high speed
adder is generating 3B; therefore, less of a delay penalty is
incurred. Utilizing radix-8 encoding for many of the par-
tial products reduces the total number of partial products,
thereby reducing the power required to sum the partial pro-
ducts.As described in Section III, three reduction steps take
place during the genemtion of 3B for both the 32x 32 bit
multiplier and the 64 x 64 bit multiplier. A diagram of the
hybrid radix-4/rWix-8 architecture is shown in Figure 2.
It is important to note that the delay penalty amoci-
ated with the generation of 3B can not be entirely mitigated
using this hybrid approach. An additional delay jwmlty is
incurred since all of the partial products are not immedi-
ately available when the reduction process is initiated. As
Wallace/Dadda reduction trees utilize parallel adder cells to
perform the partial product reduction, the more parallel data
available to the tree, the more time efficient the reduction
steps become. Thus, the availability of only a subsel of
the partial products at the initiation of the reduction process
reduces the efficiency of the early reduction steps.
By delaying the generation of the radix-8 partial prod-
ucts until three reduction steps have been completed, fewer
bits in parallel are initially available. Thus, the reduction
process is not as time efficient, requiring additional reduc-
tion steps as compared with an architecture in which all the
pactial products are available simultaneously when the re-
duction process begins. In essence, the parallelism of the
reduction tree is reduced in exchange for operating the re-
duction tree in parallel with the 3B adder.
By selecting the number of partial products generated
by radix-4 and radix-8 encoding, it is possible to limit the
number of reduction steps to just one more step than is re-
quired by a radix-4 multiplier (assuming 32x 32 bit and
64x 64 bit multipliers). For a 64x 64 bit hybrid radix-
4/radix-8 multiplier, ten partial products are generated by
the radix-4 encoding and 15 by the radix-8 encoding. As
the radu-8 partial products are not immediately available, it
is convenient to use radix-8 on the lower order partial prod-
ucts, as the low order bits are not used in the early reduction
steps. A 32 x 32 bit hybrid radix-4/radix-8 multiplier imple-
mentation has eight partial products generated by the radix-4
encoding and six partial products generated by the radix-8
encoding.
For this 64 x 64 bit hybrid radix-4/radix-8 itnplemen-
tation, the tequired nine reduction steps are as follows: 11
+9+6+4+15+13 -+9+6-4-+3-+2. For
comparison, a 64 x 64 bit radix-4 multiplier requires eight
reduction steps: 34-+28+19 +13+9+.6+4+
3 -+ 2, while a 64 x 64 bit radix-8 multiplier requires only
seven reduction steps: 23 + 19 - 13 + 9 + 6 + 4 +
3 -+ 2 [11]. Note that by using the one’s complement plus
the carry-in to form the two’s complemen4 the number of
bits at the start of the reduction process is one bit greater
than the number of partial products. This additional bit is
the carry-in of the highest order partird product. Thus, the
hybrid reduction begins at eleven bits, although there are
only ten partial products. However, when the radix-8 partial
products become available after the third reduction step, the
carry-in from the highest order radix-8 partial product does
not align with any of the resultant bits from the first three
reduction steps. Hence, the fourth reduction step begins with
the four resultant bits plus the 15 radix-8 partial products,
rather than four bits plus 16 partial products.
Whh a 32 x 32 bit multiplier, seven steps are required
for partial product reduction in a hybrid radix-4/tmlix-8 im-
plementation, as compared with six for a radix~ implemen-
tation and five for a radix-8 implementation. The reduction
steps for the 32 x 32 bit hybrid radix-4/radix-8 implementa-
tionare9d 6+4-3+6+6 -4-.+3+2.
111. Performance
The propagation detay, transistor count, and power dis-
sipation characteristics of the 32 x 32 bit and the 64 x 64
bit multipliers are presented in this section. In subsection
A, the delay of the new hybrid radix-4/radix-8 multiplier
architecture is compared with the delay of the radix-4 and
radix-8 multiplier architectures. In subsection B, the num-
ber of transistors required to implement each of the multi-
pliers is presented. The power dissipation characteristics of
the thin? architectures are compared in subsection C. ‘Ihe
power dissipation characteristics of the three architectures
with scaled power supply voltages assuming constant delay
are compared in subsection D.
A. Delay Analysis
The 32 x 32 bit and 64 x 64 bit multipliers have been
simulated in SPICE based on a 5 volt, 1.2 pm CMOS
process technology. The delay of the worst case path of
each multiplier architecture is shown in Table I. The radix~
multiplier exhibits the least delay, and the radix-8 multiplier
exhibits the most delay. The hybrid radix-4/radix-8 delay
falls between those of the radix-4 and radix-8 multipliers.
Note that the delays shown in Table I do not include the
effects of interconnect impedances.
Table I. Technology dependent delay of
multiplier architectures (1.2 ,um, 5 volt CMOS)
Radix-4
Hytmid
Radix-4/S
Radix-8
Partial Pm&@ Gene.ratinn 3.3 m 3.3 ns 9.2 ns
&tx&$ bit
Reduction 13.9 m 16.3 M 122 m
Finat High Speed Addition 9.0 m 9.0 33s 9.0 ns
Tntal 26.2 IIS 28.6 m 30.4 IIS
Partial Pm&t Gemratinn 3.3 ns 3.3 ns 7.4 ns
32 x 32 bit
Reduction 10.4 na 122 ns 8.7 ns
Fiat High s peed Additinn 7.9 ns 7.9 m 7.9 us
Total 21.6 m 23.4 IIS 24.0 us
B. Transistor Count
The number of transistors nxp.dred to implement a rout-
tiplier architecture can provide a metric by which to judge
the relative atea requirements and power dissipation of the
different architecttues, assuming that switching probabilities
for the transistors are relatively constant across architectures,
as is the case in these multipliers. The transistor count for
Copyright (C) 1996. All Rights Reserved.
the 32 x 32 bit and 64 x 64 bit implementations of each of
the three architectures are compared in Table II. The radix-
8 implementations require the fewest transistors, while the
radix-4 implementations require the most transistors. The
number of transistors required to implement the hybrid radix-
4/radix-8 multipliers falls between those of the radix-4 and
radix-8 multipliers.
TableII. Transistorcount for eaeh multiplierimplementation
Bit Width Radix-4
Hybrid
Radix-m
Radix-8
32X 32 28,522 25,678 23,542
64x64 108,038 90,210 83,412
C. Power Dissipation
The power dissipation of the multipliers has also been
analyzed based on a 5 vol~ 1.2 pm CMOS process technol-
ogy. The average power dissipation of each circuit operating
at 10 MHz is determined from SPICE using the Kang power
meter [14]. The power dissipation of each component is
averaged over 100 random input vectors. The input control
signals that drive the decoder/selector circuitry are weighted
such that the probabilities of the control signals within the
test set conform to the signal assertion probabilities gener-
ated by the encoder, e.g., Cl is twice as likely to be asserted
as either COor C2 in a radix~ decoder/selector. The results
of these simulations are presented in Table III.
Note that the encoder cells in the 64 x 64 bit and
32x 32 bit multipliers are identid, however, the loading
differs by a factor of approximately two. This distinction
accounts for the differences in encoder power dissipation
between the two multiplier configurations. In an n x n bit
multiplier, a radix-4 encoder drives n+l decoders, while a
radix-8 encoder drives n+2 decoders. ‘@wed buffers [15]
have been included between the encoders and decoders to
drive this large fanouri and the power dissipation of these
buffets has been included in the total power dissipation of
the encoder listed in Table III. As with the delay vatues
presented in Table I, these power dissipation figures do not
account for interconnect impedances.
Also note that although the sign generation circuitry is
identical for both the radix+ and radix-8 implementations,
the power dissipation of this circuit is not identical for
Table III. Power dissipation of multiplier
eom~onents. 5 V. 1.2 urn CMOS, 10 MHz..
I Multiplier CornPorrents I
Power Dksipstion
( 11.w) II r-,
Full Adder Cell I 18.5 I
Radix-4 Encoder
64x64 bit 236
32X 32 bit 154
Radix-4 Deeoder 15.0
Radix-8 Erteoder
64 X 64 bk 380
32X 32 bit 258
Radix-8 Deeoder 19.9
Radix4 Mm Generation 31.5
Radix-8 Sigrr Generation 29.9
3B Adder
65 bit 2239
33 bit I 1130 Ir
Final Hi8h Speed Adder
12$ bit 4403
64 bit 2’202
Table IV. Total hardwme and power dissipation for a
64 x 64 bit multiplier. 5 V, 1.2 pm CMOS, 10 MHz.
i Radix-4 I Hybrid Radix-4/S I Radix-tl (
1r!!noncnt I ‘%klDiQ%oJ ‘Y’%3i2%%nlN’T’wci.o_....r.
1.RitAdder. —-...-—..I
-..
1 I t 1 I __..-
Rad-1 Fmmder[ 32 7.55 I 10 2.36 ]–j–
0 I 31.20 650 I 9.751 –l -
Rsd-8 Ereoder – 15 5.70 22 836
Rsd-8 Deeoder – – 990 19.70 1452 28.89
llad~ Sign 32
&l.
1.01 10 0.32 – –
R,sd-8Sign _ _
&l.
15 0.45 22 0.66
Sign bits 961 1.00 657 0.70 630 0.66
3B Adder – – 1 2.24 1 2.24
Fmat Adder 1 4.40 1 4.40 1 4.40
TorafPOWC.X
Dissiition –
99.1 - 85.9 – 81.3
both applications. This disparity between the radix-4 and
the radix-8 power dissipation exists because the control
signal input CO does not toggle as frequently in a radix-8
implementation as it does in a radix-4 implementation. This
disparity in toggling frequency leads to lower dynamic power
dissipation in the sign bit generation circuit. As the number
of sign extension bits varies for each partial produc~ the
power dissipation of the sign generation circuitry (as shown
in Table III) accounts for only the loading of the first stage of
the tapered buffers which drive the sign extension bits. Since
the tapered buffer is customized for the specific loading of
each partial product, the power dissipation of these buffers is
included in the architecture-specific power dissipation totals
presented in Tables IV and V. Finally, note that the power
dissipation of the high speed adders are estimations based
upon the simulated power dissipation of the addem assuming
the 1.2 pm, 5 volt CMOS technology used in these example
circuits and upon extrapolations from data presented in [16].
The total power dissipated by each multiplier architec-
ture is shown for a 64 x 64 bit multiplier in Table IV and
for a 32x 32 bit multiplier in ‘Mble V. For simplicity, hatf
adder cells have teen considered as full adder cells and are
shown as l-bit adders in Tables IV and V.
As described reviously and shown in Tables IV and
Table V. Tot#hardware and power dissipationfor a
32 x 32 bit multiplier. 5 V, 1.2 pm CMOS, 10 MHz.
Radix-4 Hybrid Radix+S Radix-8
Number POWCZ Number Power Number Powex
Cmnpnmt Dissipation of Dissipation of Oimipation
In:= (mW) Instances (mW) tnstanees (mW)
l-Bit Adder 690 1277 531 9.82 455 8.42
Rad4 Encoder 16 2.46 7 1.08 – -
l+ad~
Deco&
52s 7.92 231 3.47 —
Rad-8 Enmder – – 6 1.55 11 2.84
Rad-8
fk’xk - –
204 4.06 374 7.44
ltad~ SiSn 16
&l.
0.50 7 0.22 - -
Rsd-8 Siin _ _
&l.
6 0.18 11 0.33
Sign bits 22s 0.27 159 0.20 145 0.18
3B Adder - – 1 1.13 1 1.13
Fiat Adder 1 2.20 1 220 1 2.20
Ted pOWCJ
Dissbation -
26.1 - 23.9 - 225
Copyright (C) 1996. All Rights Reserved.
V, a radix-8 multiplier dissipates less power than a radix-4
multiplier. The hybrid radix4/radix-8 architwture dissipates
power at a level betwea that of the radix-4 and radix-
8 multipliers. Thus, the hybrid radix-4/radix-8 multiplier
archkcture is a useful architecture for those applications
which require low power while operating at speeds greater
than that of a full radix-8 multiplier. Radix-8 multiplication
is appropriate for those ultra-low power systems in which
added delay can be tolerated.
D. The Effects of Voltage Scaling on Performance
Voltage scaling, reducing the power supply voltage,
may be applied to higher speed multipliers to reduce the
power dissipation of these circuits, while simultaneously in-
creasing delay. The delay of the multipliers is proportional to
the power supply, V~~, as shown in (I), where VT represents
the transistor threshold voltage, and the power dissipation is
proportional to the square of the power supply voltage as
shown in (2) [17].
Delay m
VDD
(VDD - VT)’
(1)
Power m (VDD)2 (2)
The power dissipation of the radix-4, hybrid radix-
4/radix-8, and radix-8 multipliers after voltage scaling is
compared in Thble VI. Note that the scaled voltage levels
are referenced to the radix-8 multiplier operating at 5 volts.
For shorter bit widths such as exemplified by a 32 x 32 bit
multiplier, the delay and power dissipation overhead due to
the additional 3B adder and more complex encoding is not
outweighed by the reduction in delay and power dissipation
associated with the partial product summation. In this case,
the simpler radix~ encoded multiplier provides the lowest
power dissipation at a given delay.
However at higher bit widths, as exemplified by the
64 x 64 bit multiplied, the radix-4 and radix-8 multipliers
dissipate approximately equivalent power at a given delay,
both of which are greater than the hybrid radix4/radix-8
multiplier.
Table VL Comparison of performance
of voltage scaled multiplier architectures
Hybrid
Radix-4 Radix- Radix-8 I
4/8
64 x 64 bit
VmJfor 30.4 ns delay 4.53 v 4.79 v 5.00 v
Power dissipation 81.3 mW 78.8 mW 81.3 mW
32 X 32 bit
VDD for 24.0 ns delay 4.65 V 4.91 v 5.00 v
Power dissipation 22.5 mW 23.1 mW 24.0 mW
IV. Conclusions
As higher bit widths and lower power become impor-
tant design issues in multipliers, the opportunity to develop
new architectures to meet these requirements arises. A new
hybrid radix-4/radix-8 multiplier architecture is presented in
this paper that is both low power and high spe@ this ar-
chitecture provides a trade-off between the high speed of a
radix-4 multiplier architecture and the low power dissipation
of a radix-8 multiplier architecture. In this hybrid radix-
4/radix-8 multiplier archhectttre, the performance bottleneck
of a radix-8 multiplier (the generation of 3B for the radix-8
partial product generation) is performed in parallel with the
reduction of the radix-4 partial products rather than serially,
as in a radix-8 multiplier. Thus, the hybrid radix-4/radix-
8 multiplier accomplishes a portion of the partial product
reduction while a high speed adder is generating 3B. This
strategy minimizes a portion of the delay penalty incurred
by the radix-8 multiplier in generating 3B. The hybrid radix-
4/radix-8 multiplier architecture dissipates. 13% less power
in a 64 x 64 bit multiplier with only a 9% increase in delay,
as compared to a radix-4 implementation. When the supply
voltage of the 64 x 64 bit multipliers is scaled such that
the radix-4, radix-8, and hybrid radix-4/radix-8 multipliers
exhibit the same delay, the hybrid radix-4/radix-8 multiplier
dissipates the least power. The hybrid radix+radix-8 ar-
chitecture therefore provides a trade-off between high speed
and low power for application to those systems which re-
quire both high speed and low power signed multiplication.
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
References
S. R. Powetl and P. M. Chau, “Estimating Power Dissipation
of VLSI Signal Processing Chips: the PFA Technique,” VLSI
Signal Processing IV, ch. 24, New York: IEEE Press, 1990.
R. Sharrn~ A. D. LopeL J. A. Michej@ S. J. Hillenius, J. M.
Ancfrews, and A. J. Studwel~ “A 6.75ns 16x16-bit Multiplier
in Single-Level-Metal,” IEEE Journal of Solid-Siate Circuits,
Vol. SC-24, No. 4, pp. 922-927, August 1989.
K. Yaoo,T. Yaman@ T. Niahi& M. Saito, K. Shimohigsahi,
and A. Shimizu, “A 3.8-ns CMOS 16x16-b Multiplier Using
Complementary Pass-Transistor Logicfl IEEE Journal of
~&o-State Circuits, Vol. SC-25, No. 2, pp. 388–395, April
M. Nagamatau, S. Tmak~ J. Mori, K. Hirano, T. Noguchi,
and K. Hatartak& “A 150s 32x32b CMOS Multiplier with
an Improved Paratlel Structure,” IEEE Journal of Solid-State
Curcuits, Vol. SC-25, No. 2, pp. 494497, April 1990.
J. MorL M. Nagarnatau, M. Hirano, S. Tartak~ M. Noda,
Y. Yoyoshim& K. Hashimoto, H. Hayashid& and K. Maeguchi,
“A 10ns 54x54b Parallel Structured Full Array Multiplier with
0.5pm CMOS Technology? IEEE Journal of Solid-State Cir-
cuits, Vol. SC-26, No. 4, pp. 600-606, April 1990.
G. Goto, T. Sate, M. Nakajitn& and T. Sukemur~ “A 54x54-b
Regularly Structured Tree Multiplier; IEEE Journal of Solid-
Slate Circuits, Vol. SC-27, No. 9, pp. 1229-1236, September
1992.
F. Lu and H. Ssrnueli, “A 200-MHz CMOS Pipelirted
Multiplier-Accumulator using a Quasi-Domino Full-Adder
Cett Design: IEEE Journal of Solid-State Circuits, Vol. SC-
28, No. Z pp. 123132, February 1993.
A. D. Booth, “A Signed Binary Multiplication Technique,”
Quarterly Journal of Mechanics and Applied Mathematics,
Vol. 4, No. 2, pp. 236-2450, June 1951.
0. L. MacSorley, “High-Speed Arithmetic in Binary Comput-
ersfl Proceedings of the IRE, Vol.49, pp. 67-91, January 1961.
[10] H, Sam and A. Gup@ “A Generalized Multibit Recoding of
Tko’s Complement Binary Numbers and Its Proof with Appli-
cation in Multiplier Implementations,” IEEE Transactions on
Computers, Vol. C-39, No. 8, pp. 1006-1015, August 1990.
[11] B. Millar, P. E. Madrid, and E. E. Swartzlsnder, Jr., “A Fast
Hybrid Multiplier Combining Booth and Wallace/f)adda Al-
gorithms? Proceedings of the 35h IEEE Midwest Symposium
on Circuits and Systems, pp. 158–165, August 1992.
[12] C. S. Wallace, “A Suggestion for a Fast Multiplierfl IEEE
Transactwnr on Electronic Computers, Vol. EC-13, pp. 14-
17, February 1964.
[13] L. DadW “Some Schemes for Parallel Multipliers: Alta
Frequenza, Vol. 34, No. 5, pp. 349-356, May 1965.
[14] S. M. Kartg, “Accurate Simulation of Power Dissipation in
VLSI Circuits,” lEEEJournai of Solid-State Circuits, Vol. SC-
21, No. 5, pp. 889-891, October 1986.
[15] B. S. Cherkauer and E. G. Friedman, “A Unified Design
Methodology for CMOS Tapered Buffers,” IEEE Transactions
on VLSI Systemr, Vol.VLSI-3, No. 1, pp. 99-111, March 1995.
[16] T. K. Callaway and E. E. Swartzlander, Jr., “Estimating the
Power Consumption of CMOS Adders; Proceedings of the
Ilti IEEE Symposium on Computer Arithmetic, pp. 210-216,
June/July 1993.
[17] A. P. Chandrakasart, S. Sheng, and R. W. Broderson, “Low-
Power CMOS Digital Design,” IEEE Journal of Solid-State
Circuits, Vol. SC–27, No. 4. pp. 473483, April 1992.
.Copyright (C) 1996. All Rights Reserved.

Contenu connexe

Tendances

FPGA-Based Multi-Level Approximate Multipliers for High-Performance Error-Res...
FPGA-Based Multi-Level Approximate Multipliers for High-Performance Error-Res...FPGA-Based Multi-Level Approximate Multipliers for High-Performance Error-Res...
FPGA-Based Multi-Level Approximate Multipliers for High-Performance Error-Res...AishwaryaRavishankar8
 
A High Speed Wallace Tree Multiplier Using Modified Booth Algorithm for Fast ...
A High Speed Wallace Tree Multiplier Using Modified Booth Algorithm for Fast ...A High Speed Wallace Tree Multiplier Using Modified Booth Algorithm for Fast ...
A High Speed Wallace Tree Multiplier Using Modified Booth Algorithm for Fast ...Kumar Goud
 
IRJET- Efficient Design of Radix Booth Multiplier
IRJET- Efficient Design of Radix Booth MultiplierIRJET- Efficient Design of Radix Booth Multiplier
IRJET- Efficient Design of Radix Booth MultiplierIRJET Journal
 
High Speed radix256 algorithm using parallel prefix adder
High Speed radix256 algorithm using parallel prefix adderHigh Speed radix256 algorithm using parallel prefix adder
High Speed radix256 algorithm using parallel prefix adderIJMER
 
Design of High Performance 8,16,32-bit Vedic Multipliers using SCL PDK 180nm ...
Design of High Performance 8,16,32-bit Vedic Multipliers using SCL PDK 180nm ...Design of High Performance 8,16,32-bit Vedic Multipliers using SCL PDK 180nm ...
Design of High Performance 8,16,32-bit Vedic Multipliers using SCL PDK 180nm ...Angel Yogi
 
MAC UNIT USING DIFFERENT MULTIPLIERS
MAC UNIT USING DIFFERENT MULTIPLIERSMAC UNIT USING DIFFERENT MULTIPLIERS
MAC UNIT USING DIFFERENT MULTIPLIERSBhamidipati Gayatri
 
Area-Efficient VLSI Implementation for Parallel Linear-Phase FIR Digital Filt...
Area-Efficient VLSI Implementation for Parallel Linear-Phase FIR Digital Filt...Area-Efficient VLSI Implementation for Parallel Linear-Phase FIR Digital Filt...
Area-Efficient VLSI Implementation for Parallel Linear-Phase FIR Digital Filt...IOSR Journals
 
VLSI Implementation of High Speed & Low Power Multiplier in FPGA
VLSI Implementation of High Speed & Low Power Multiplier in FPGAVLSI Implementation of High Speed & Low Power Multiplier in FPGA
VLSI Implementation of High Speed & Low Power Multiplier in FPGAIOSR Journals
 
Area Efficient and Reduced Pin Count Multipliers
Area Efficient and Reduced Pin Count MultipliersArea Efficient and Reduced Pin Count Multipliers
Area Efficient and Reduced Pin Count MultipliersCSCJournals
 
Compressor based approximate multiplier architectures for media processing ap...
Compressor based approximate multiplier architectures for media processing ap...Compressor based approximate multiplier architectures for media processing ap...
Compressor based approximate multiplier architectures for media processing ap...IJECEIAES
 
222083242 full-documg
222083242 full-documg222083242 full-documg
222083242 full-documghomeworkping9
 
High Performance MAC Unit for FFT Implementation
High Performance MAC Unit for FFT Implementation High Performance MAC Unit for FFT Implementation
High Performance MAC Unit for FFT Implementation IJMER
 
Area-Delay Efficient Binary Adders in QCA
Area-Delay Efficient Binary Adders in QCAArea-Delay Efficient Binary Adders in QCA
Area-Delay Efficient Binary Adders in QCAIJERA Editor
 
128 bit low power and area efficient carry select adder amit bakshi academia
128 bit low power and area efficient carry select adder   amit bakshi   academia128 bit low power and area efficient carry select adder   amit bakshi   academia
128 bit low power and area efficient carry select adder amit bakshi academiagopi448
 

Tendances (20)

Array multiplier
Array multiplierArray multiplier
Array multiplier
 
Fd3110361040
Fd3110361040Fd3110361040
Fd3110361040
 
FPGA-Based Multi-Level Approximate Multipliers for High-Performance Error-Res...
FPGA-Based Multi-Level Approximate Multipliers for High-Performance Error-Res...FPGA-Based Multi-Level Approximate Multipliers for High-Performance Error-Res...
FPGA-Based Multi-Level Approximate Multipliers for High-Performance Error-Res...
 
A High Speed Wallace Tree Multiplier Using Modified Booth Algorithm for Fast ...
A High Speed Wallace Tree Multiplier Using Modified Booth Algorithm for Fast ...A High Speed Wallace Tree Multiplier Using Modified Booth Algorithm for Fast ...
A High Speed Wallace Tree Multiplier Using Modified Booth Algorithm for Fast ...
 
IRJET- Efficient Design of Radix Booth Multiplier
IRJET- Efficient Design of Radix Booth MultiplierIRJET- Efficient Design of Radix Booth Multiplier
IRJET- Efficient Design of Radix Booth Multiplier
 
Hz3115131516
Hz3115131516Hz3115131516
Hz3115131516
 
High Speed radix256 algorithm using parallel prefix adder
High Speed radix256 algorithm using parallel prefix adderHigh Speed radix256 algorithm using parallel prefix adder
High Speed radix256 algorithm using parallel prefix adder
 
Design of High Performance 8,16,32-bit Vedic Multipliers using SCL PDK 180nm ...
Design of High Performance 8,16,32-bit Vedic Multipliers using SCL PDK 180nm ...Design of High Performance 8,16,32-bit Vedic Multipliers using SCL PDK 180nm ...
Design of High Performance 8,16,32-bit Vedic Multipliers using SCL PDK 180nm ...
 
MAC UNIT USING DIFFERENT MULTIPLIERS
MAC UNIT USING DIFFERENT MULTIPLIERSMAC UNIT USING DIFFERENT MULTIPLIERS
MAC UNIT USING DIFFERENT MULTIPLIERS
 
Area-Efficient VLSI Implementation for Parallel Linear-Phase FIR Digital Filt...
Area-Efficient VLSI Implementation for Parallel Linear-Phase FIR Digital Filt...Area-Efficient VLSI Implementation for Parallel Linear-Phase FIR Digital Filt...
Area-Efficient VLSI Implementation for Parallel Linear-Phase FIR Digital Filt...
 
Ad04606184188
Ad04606184188Ad04606184188
Ad04606184188
 
VLSI Implementation of High Speed & Low Power Multiplier in FPGA
VLSI Implementation of High Speed & Low Power Multiplier in FPGAVLSI Implementation of High Speed & Low Power Multiplier in FPGA
VLSI Implementation of High Speed & Low Power Multiplier in FPGA
 
Area Efficient and Reduced Pin Count Multipliers
Area Efficient and Reduced Pin Count MultipliersArea Efficient and Reduced Pin Count Multipliers
Area Efficient and Reduced Pin Count Multipliers
 
Compressor based approximate multiplier architectures for media processing ap...
Compressor based approximate multiplier architectures for media processing ap...Compressor based approximate multiplier architectures for media processing ap...
Compressor based approximate multiplier architectures for media processing ap...
 
222083242 full-documg
222083242 full-documg222083242 full-documg
222083242 full-documg
 
Q045079298
Q045079298Q045079298
Q045079298
 
High Performance MAC Unit for FFT Implementation
High Performance MAC Unit for FFT Implementation High Performance MAC Unit for FFT Implementation
High Performance MAC Unit for FFT Implementation
 
Design and Analysis of 4-2 Compressor for Arithmetic Application
Design and Analysis of 4-2 Compressor for Arithmetic ApplicationDesign and Analysis of 4-2 Compressor for Arithmetic Application
Design and Analysis of 4-2 Compressor for Arithmetic Application
 
Area-Delay Efficient Binary Adders in QCA
Area-Delay Efficient Binary Adders in QCAArea-Delay Efficient Binary Adders in QCA
Area-Delay Efficient Binary Adders in QCA
 
128 bit low power and area efficient carry select adder amit bakshi academia
128 bit low power and area efficient carry select adder   amit bakshi   academia128 bit low power and area efficient carry select adder   amit bakshi   academia
128 bit low power and area efficient carry select adder amit bakshi academia
 

Similaire à A hybrid radix 4 radix-8 low-power, high

PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...
PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...
PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...Hari M
 
Low Power 16×16 Bit Multiplier Design using Dadda Algorithm
Low Power 16×16 Bit Multiplier Design using Dadda AlgorithmLow Power 16×16 Bit Multiplier Design using Dadda Algorithm
Low Power 16×16 Bit Multiplier Design using Dadda Algorithmijtsrd
 
A Review of Different Methods for Booth Multiplier
A Review of Different Methods for Booth MultiplierA Review of Different Methods for Booth Multiplier
A Review of Different Methods for Booth MultiplierIJERA Editor
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
IRJET- Low Power Adder and Multiplier Circuits Design Optimization in VLSI
IRJET- Low Power Adder and Multiplier Circuits Design Optimization in VLSIIRJET- Low Power Adder and Multiplier Circuits Design Optimization in VLSI
IRJET- Low Power Adder and Multiplier Circuits Design Optimization in VLSIIRJET Journal
 
Parallel Processing Technique for Time Efficient Matrix Multiplication
Parallel Processing Technique for Time Efficient Matrix MultiplicationParallel Processing Technique for Time Efficient Matrix Multiplication
Parallel Processing Technique for Time Efficient Matrix MultiplicationIJERA Editor
 
Comparative Analysis of 11T and 16T and 28T Full Adder Based 4*4 Wallace Tree...
Comparative Analysis of 11T and 16T and 28T Full Adder Based 4*4 Wallace Tree...Comparative Analysis of 11T and 16T and 28T Full Adder Based 4*4 Wallace Tree...
Comparative Analysis of 11T and 16T and 28T Full Adder Based 4*4 Wallace Tree...IRJET Journal
 
High Speed Area Efficient 8-point FFT using Vedic Multiplier
High Speed Area Efficient 8-point FFT using Vedic MultiplierHigh Speed Area Efficient 8-point FFT using Vedic Multiplier
High Speed Area Efficient 8-point FFT using Vedic MultiplierIJERA Editor
 
Design and Implementation of 8 Bit Multiplier Using M.G.D.I. Technique
Design and Implementation of 8 Bit Multiplier Using M.G.D.I. TechniqueDesign and Implementation of 8 Bit Multiplier Using M.G.D.I. Technique
Design and Implementation of 8 Bit Multiplier Using M.G.D.I. TechniqueIJMER
 
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...IRJET Journal
 
Implementation of Radix-4 Booth Multiplier by VHDL
Implementation of Radix-4 Booth Multiplier by VHDLImplementation of Radix-4 Booth Multiplier by VHDL
Implementation of Radix-4 Booth Multiplier by VHDLpaperpublications3
 
Design and Implementation of 8 Bit Multiplier Using M.G.D.I. Technique
Design and Implementation of 8 Bit Multiplier Using M.G.D.I. TechniqueDesign and Implementation of 8 Bit Multiplier Using M.G.D.I. Technique
Design and Implementation of 8 Bit Multiplier Using M.G.D.I. TechniqueIJMER
 
IRJET- Image and Signal Filtering using Fir Filter Made using Approximate Hyb...
IRJET- Image and Signal Filtering using Fir Filter Made using Approximate Hyb...IRJET- Image and Signal Filtering using Fir Filter Made using Approximate Hyb...
IRJET- Image and Signal Filtering using Fir Filter Made using Approximate Hyb...IRJET Journal
 
IRJET- Implementation of Radix-16 and Binary 64 Division VLSI Realization...
IRJET-  	  Implementation of Radix-16 and Binary 64 Division VLSI Realization...IRJET-  	  Implementation of Radix-16 and Binary 64 Division VLSI Realization...
IRJET- Implementation of Radix-16 and Binary 64 Division VLSI Realization...IRJET Journal
 
A SURVEY - COMPARISON OF MULTIPLIERS USING DIFFERENT LOGIC STYLE
A SURVEY - COMPARISON OF MULTIPLIERS USING DIFFERENT LOGIC STYLEA SURVEY - COMPARISON OF MULTIPLIERS USING DIFFERENT LOGIC STYLE
A SURVEY - COMPARISON OF MULTIPLIERS USING DIFFERENT LOGIC STYLEEditor IJMTER
 

Similaire à A hybrid radix 4 radix-8 low-power, high (20)

346 351
346 351346 351
346 351
 
PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...
PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...
PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...
 
Bn26425431
Bn26425431Bn26425431
Bn26425431
 
Low Power 16×16 Bit Multiplier Design using Dadda Algorithm
Low Power 16×16 Bit Multiplier Design using Dadda AlgorithmLow Power 16×16 Bit Multiplier Design using Dadda Algorithm
Low Power 16×16 Bit Multiplier Design using Dadda Algorithm
 
A Review of Different Methods for Booth Multiplier
A Review of Different Methods for Booth MultiplierA Review of Different Methods for Booth Multiplier
A Review of Different Methods for Booth Multiplier
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
IRJET- Low Power Adder and Multiplier Circuits Design Optimization in VLSI
IRJET- Low Power Adder and Multiplier Circuits Design Optimization in VLSIIRJET- Low Power Adder and Multiplier Circuits Design Optimization in VLSI
IRJET- Low Power Adder and Multiplier Circuits Design Optimization in VLSI
 
Parallel Processing Technique for Time Efficient Matrix Multiplication
Parallel Processing Technique for Time Efficient Matrix MultiplicationParallel Processing Technique for Time Efficient Matrix Multiplication
Parallel Processing Technique for Time Efficient Matrix Multiplication
 
Comparative Analysis of 11T and 16T and 28T Full Adder Based 4*4 Wallace Tree...
Comparative Analysis of 11T and 16T and 28T Full Adder Based 4*4 Wallace Tree...Comparative Analysis of 11T and 16T and 28T Full Adder Based 4*4 Wallace Tree...
Comparative Analysis of 11T and 16T and 28T Full Adder Based 4*4 Wallace Tree...
 
High Speed Area Efficient 8-point FFT using Vedic Multiplier
High Speed Area Efficient 8-point FFT using Vedic MultiplierHigh Speed Area Efficient 8-point FFT using Vedic Multiplier
High Speed Area Efficient 8-point FFT using Vedic Multiplier
 
Ik3614691472
Ik3614691472Ik3614691472
Ik3614691472
 
Design and Implementation of 8 Bit Multiplier Using M.G.D.I. Technique
Design and Implementation of 8 Bit Multiplier Using M.G.D.I. TechniqueDesign and Implementation of 8 Bit Multiplier Using M.G.D.I. Technique
Design and Implementation of 8 Bit Multiplier Using M.G.D.I. Technique
 
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
 
Implementation of Radix-4 Booth Multiplier by VHDL
Implementation of Radix-4 Booth Multiplier by VHDLImplementation of Radix-4 Booth Multiplier by VHDL
Implementation of Radix-4 Booth Multiplier by VHDL
 
Design and Implementation of 8 Bit Multiplier Using M.G.D.I. Technique
Design and Implementation of 8 Bit Multiplier Using M.G.D.I. TechniqueDesign and Implementation of 8 Bit Multiplier Using M.G.D.I. Technique
Design and Implementation of 8 Bit Multiplier Using M.G.D.I. Technique
 
IRJET- Image and Signal Filtering using Fir Filter Made using Approximate Hyb...
IRJET- Image and Signal Filtering using Fir Filter Made using Approximate Hyb...IRJET- Image and Signal Filtering using Fir Filter Made using Approximate Hyb...
IRJET- Image and Signal Filtering using Fir Filter Made using Approximate Hyb...
 
B1030610
B1030610B1030610
B1030610
 
IRJET- Implementation of Radix-16 and Binary 64 Division VLSI Realization...
IRJET-  	  Implementation of Radix-16 and Binary 64 Division VLSI Realization...IRJET-  	  Implementation of Radix-16 and Binary 64 Division VLSI Realization...
IRJET- Implementation of Radix-16 and Binary 64 Division VLSI Realization...
 
A SURVEY - COMPARISON OF MULTIPLIERS USING DIFFERENT LOGIC STYLE
A SURVEY - COMPARISON OF MULTIPLIERS USING DIFFERENT LOGIC STYLEA SURVEY - COMPARISON OF MULTIPLIERS USING DIFFERENT LOGIC STYLE
A SURVEY - COMPARISON OF MULTIPLIERS USING DIFFERENT LOGIC STYLE
 

Plus de Sudhir Kumar

air and dd question bank
air and dd question bank air and dd question bank
air and dd question bank Sudhir Kumar
 
model question on engineering assistants air and dd with answers
model question on engineering assistants air and dd with answersmodel question on engineering assistants air and dd with answers
model question on engineering assistants air and dd with answersSudhir Kumar
 
doordarshan and air question papers with answers
doordarshan and air question papers with answersdoordarshan and air question papers with answers
doordarshan and air question papers with answersSudhir Kumar
 
engineering assistants question papers in doordarshan and air
engineering assistants question papers in doordarshan and airengineering assistants question papers in doordarshan and air
engineering assistants question papers in doordarshan and airSudhir Kumar
 
apgenco 2012 question paper
apgenco 2012 question paperapgenco 2012 question paper
apgenco 2012 question paperSudhir Kumar
 
Sub-prime Crisis - Collapse of Lehman Brothers
Sub-prime Crisis - Collapse of Lehman BrothersSub-prime Crisis - Collapse of Lehman Brothers
Sub-prime Crisis - Collapse of Lehman BrothersSudhir Kumar
 
Application of electroceramics
Application of electroceramicsApplication of electroceramics
Application of electroceramicsSudhir Kumar
 
Simple regenerative receiver
Simple regenerative receiverSimple regenerative receiver
Simple regenerative receiverSudhir Kumar
 
Wallace tree multiplier
Wallace tree multiplierWallace tree multiplier
Wallace tree multiplierSudhir Kumar
 

Plus de Sudhir Kumar (15)

all india radio
all india radioall india radio
all india radio
 
air and dd question bank
air and dd question bank air and dd question bank
air and dd question bank
 
model question on engineering assistants air and dd with answers
model question on engineering assistants air and dd with answersmodel question on engineering assistants air and dd with answers
model question on engineering assistants air and dd with answers
 
doordarshan and air question papers with answers
doordarshan and air question papers with answersdoordarshan and air question papers with answers
doordarshan and air question papers with answers
 
engineering assistants question papers in doordarshan and air
engineering assistants question papers in doordarshan and airengineering assistants question papers in doordarshan and air
engineering assistants question papers in doordarshan and air
 
apgenco 2012 question paper
apgenco 2012 question paperapgenco 2012 question paper
apgenco 2012 question paper
 
Sub-prime Crisis - Collapse of Lehman Brothers
Sub-prime Crisis - Collapse of Lehman BrothersSub-prime Crisis - Collapse of Lehman Brothers
Sub-prime Crisis - Collapse of Lehman Brothers
 
Isdn (1)
Isdn (1)Isdn (1)
Isdn (1)
 
Application of electroceramics
Application of electroceramicsApplication of electroceramics
Application of electroceramics
 
Booth Multiplier
Booth MultiplierBooth Multiplier
Booth Multiplier
 
3 g tutorial
3 g tutorial3 g tutorial
3 g tutorial
 
Pda
PdaPda
Pda
 
Simple regenerative receiver
Simple regenerative receiverSimple regenerative receiver
Simple regenerative receiver
 
diode technology
diode technologydiode technology
diode technology
 
Wallace tree multiplier
Wallace tree multiplierWallace tree multiplier
Wallace tree multiplier
 

Dernier

What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinojohnmickonozaleda
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxPoojaSen20
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 

Dernier (20)

What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipino
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 

A hybrid radix 4 radix-8 low-power, high

  • 1. A Hybrid Radix-4/Radix-8 LOWPower, High Speed Multiplier Architecture for Wide Bit Widths Brian S. Cherkauerl and Eby G. Friedrnan2 1Intel Corporation 2200 Mission College Blvd. Santa Clara California 95052 Abstract – A hybrid radix-4/radix-8 architecture targeted for high bit multipliers is presented as a compromise between the high speed of a radix-4 multiplier architecture and the low power dissipation of a radix-8 multiplier architecture. In this hybrid radix-4/radix-8 multiplier architecture, the perfor- mance bottleneck of a radix-8 multiplier, the generation of three times the multiplicand for use in generating the radix-8 partial product, is performed in parallel with the reduction of the radix+ partird products rather than serially, as in a radix-8 multiplier. This hybrid radix+radix-8 multiplier ar- chitecture requires 13% less power for a 64 x 64 bit multi- plier, and results in only a 9% increase in delay, as com- “pared with a radix~ implementation, When supply voltage is sealed such that all multipliers exhibit the same delay, the 64 x 64 bit hybrid radixJVradix-8 multiplier dissipates less power than either the radix-4 or radix-8 multipliers. The hy- brid radix-4/radix-8 amhiteeture is therefore appropriate for those applications that must dissipate minimal power and operate at high speeds. I. Introduction High speed multipliers are fundamental elements in sig- nal processing and arithmetic based systems. The higher bit widths required of modem multipliers provide the opportu- nity to explore new architectures which would be impmcticrd for smaller bit widfh multiplication. Architectures for circuit elements historically were designed to operate at maximum speed, notwithstanding the resulting power dissipation. Re- eently, gmter emphasis has been placed on reducing the power dissipation of important circuit functions while main- taining these high speeds. Therefore, power dissipation as well as circuit speed should be considered at the architec- tural level. A leveling off of the power factor, the power dissipated per bit2.Hz, and henee the power efficiency, has recently been observed [1]. This leveling of the power factor is illustrated in Figure 1. This trend leads to the conclusion that to fitrt.her improve the power efficiency of multiplied, power dissipation must be addressed at the architectural level as well as at the circuit level. The data in Figure 1 present the power factors for a number of recent multiplier implementations. Sharma er al. utilized Booth radix-4 eneoding along with a reduction array of carry save adders (CSAS) generated by a recursive algorithm to produce the 16 x 16-bit multiplier in [2]. In [3], Yano ef al. introduced the complementary pass-transistor logic family (CPL) and implemented a 16x 16-bit multiplier in CPL which used no encoding but did use a Wallace tree for partial product reduction. Nagamatsu ef al. presented a 32 x 32-bit multiplier in which Booth radix-4 was used to generate the partial products and a tree of 4:2 counters was used to reduee these partial products [4]. Mori ef d. designed a 54 x 54-bit multiplier similar in structure to that of [4], also utilizing Booth radix~ and 42 counters [5]. In [6], Goto This research was supported in part by the Natimud Science Foun- dation under Grant No. MIP-920S165 and Grant No. MIP-94238S6, the Army Research Oftiee under Grant No. DAAH04-93-G-0323, and by a grant from the Xerox Corporation. 2Department of Electrical Engineering University of Rochester Rochester, New York 14627 400, pw bit2*Hz :L 3s0 (1) 2s0 150 100 t + (2) 50 0 19$9 1990 1991 199a 1992 1994 1995 Year Figure 1. Multiplierpower factor et al. presented a 54 x 54-bit multiplier with Booth radix-4 partird product generation, but used a regularly structured tree for partiat product reduction, thereby simplifying the physiezd layout. Lu and Samueli were most concerned with throughput in the design of the multiplier-accumulator described in [7], and thus they presented a 13-stage, deeply pipelined 12 x 12-bit multiplier-accumulator which used no encoding and was implemented with a quasi-domino dynamic logic family. The data point representing this work is a 64 x 64-bit multiplier using both Booth mlix-4 and miix-8 encoding with a Dad& reduction tree. In this paper a hybrid Bodt radix4/radix-8 multiplier architecture is presented as a method to trade-off speed and power dissipation in two’s complement signed multipliers. The improved speed and power dissipation characteristics of this new multiplier architecture are compared with that of standard radix~ and radix-8 based multipliers. The hy- brid radix-4/radix-8 architecture presented in this paper is described in Section II. The speed and power dissipation characteristics of the three multiplier architectures are com- pared in Section III. Finally, some conclusions are drawn in Section IV. II. Hybrid Radix Architecture (Radix-4/Radix-8) The proposed hybrid radix-4/mdix-8 multiplier archi- tecture uses a combination of modified Booth radix-4 and radix-8 encoding [8–10]. The hybrid radix-4/radix-8 archi- tecture mitigates the delay penalty associated with the gen- eration of 3B (see Figure 2) for radix-8 encoding by using the additional parallelism of the radix-4 encoding/reduction. In this manner the hybrid radix-4/radix-8 multiplier com- bines the speed advantage of the radix-4 multiplier with the reduced power dissipation of the radix-8 multiplier. In a radix-8 architecture, the multiplication proee-ss is serially dependent upon the time required to generate 3B: while 3B is being generated by a high speed adder, no other processing can take place within the multiplier. This requirement to generate 3B leads to a significant delay penalty, on the order of 1O-2O%,as compared with a radix-4 architecture (where the partial products may be generated by simple shifting and/or complementing) [11]. In the hybrid radix-4/radix-8 architecture, a subset of the partial products are generated using radix-4 modified Copyright (C) 1996. All Rights Reserved.
  • 2. CFigure 2. Hybridradix-4/radix-8multiplierimchitecture Booth encoding. Reduction begins on these radix-4 par- tial products while 3B is simultaneously being generated by a high speed adder. Upon generating 3B, the remain- ing partial products are generated using radix-8 encoding, and these partial products are subsequently included within the reduction tree. A Wallace/Dadda structure is assumed for the reduction tree [12,13]. In this manner, some reduc- tion of the partial products takes place while the high speed adder is generating 3B; therefore, less of a delay penalty is incurred. Utilizing radix-8 encoding for many of the par- tial products reduces the total number of partial products, thereby reducing the power required to sum the partial pro- ducts.As described in Section III, three reduction steps take place during the genemtion of 3B for both the 32x 32 bit multiplier and the 64 x 64 bit multiplier. A diagram of the hybrid radix-4/rWix-8 architecture is shown in Figure 2. It is important to note that the delay penalty amoci- ated with the generation of 3B can not be entirely mitigated using this hybrid approach. An additional delay jwmlty is incurred since all of the partial products are not immedi- ately available when the reduction process is initiated. As Wallace/Dadda reduction trees utilize parallel adder cells to perform the partial product reduction, the more parallel data available to the tree, the more time efficient the reduction steps become. Thus, the availability of only a subsel of the partial products at the initiation of the reduction process reduces the efficiency of the early reduction steps. By delaying the generation of the radix-8 partial prod- ucts until three reduction steps have been completed, fewer bits in parallel are initially available. Thus, the reduction process is not as time efficient, requiring additional reduc- tion steps as compared with an architecture in which all the pactial products are available simultaneously when the re- duction process begins. In essence, the parallelism of the reduction tree is reduced in exchange for operating the re- duction tree in parallel with the 3B adder. By selecting the number of partial products generated by radix-4 and radix-8 encoding, it is possible to limit the number of reduction steps to just one more step than is re- quired by a radix-4 multiplier (assuming 32x 32 bit and 64x 64 bit multipliers). For a 64x 64 bit hybrid radix- 4/radix-8 multiplier, ten partial products are generated by the radix-4 encoding and 15 by the radix-8 encoding. As the radu-8 partial products are not immediately available, it is convenient to use radix-8 on the lower order partial prod- ucts, as the low order bits are not used in the early reduction steps. A 32 x 32 bit hybrid radix-4/radix-8 multiplier imple- mentation has eight partial products generated by the radix-4 encoding and six partial products generated by the radix-8 encoding. For this 64 x 64 bit hybrid radix-4/radix-8 itnplemen- tation, the tequired nine reduction steps are as follows: 11 +9+6+4+15+13 -+9+6-4-+3-+2. For comparison, a 64 x 64 bit radix-4 multiplier requires eight reduction steps: 34-+28+19 +13+9+.6+4+ 3 -+ 2, while a 64 x 64 bit radix-8 multiplier requires only seven reduction steps: 23 + 19 - 13 + 9 + 6 + 4 + 3 -+ 2 [11]. Note that by using the one’s complement plus the carry-in to form the two’s complemen4 the number of bits at the start of the reduction process is one bit greater than the number of partial products. This additional bit is the carry-in of the highest order partird product. Thus, the hybrid reduction begins at eleven bits, although there are only ten partial products. However, when the radix-8 partial products become available after the third reduction step, the carry-in from the highest order radix-8 partial product does not align with any of the resultant bits from the first three reduction steps. Hence, the fourth reduction step begins with the four resultant bits plus the 15 radix-8 partial products, rather than four bits plus 16 partial products. Whh a 32 x 32 bit multiplier, seven steps are required for partial product reduction in a hybrid radix-4/tmlix-8 im- plementation, as compared with six for a radix~ implemen- tation and five for a radix-8 implementation. The reduction steps for the 32 x 32 bit hybrid radix-4/radix-8 implementa- tionare9d 6+4-3+6+6 -4-.+3+2. 111. Performance The propagation detay, transistor count, and power dis- sipation characteristics of the 32 x 32 bit and the 64 x 64 bit multipliers are presented in this section. In subsection A, the delay of the new hybrid radix-4/radix-8 multiplier architecture is compared with the delay of the radix-4 and radix-8 multiplier architectures. In subsection B, the num- ber of transistors required to implement each of the multi- pliers is presented. The power dissipation characteristics of the thin? architectures are compared in subsection C. ‘Ihe power dissipation characteristics of the three architectures with scaled power supply voltages assuming constant delay are compared in subsection D. A. Delay Analysis The 32 x 32 bit and 64 x 64 bit multipliers have been simulated in SPICE based on a 5 volt, 1.2 pm CMOS process technology. The delay of the worst case path of each multiplier architecture is shown in Table I. The radix~ multiplier exhibits the least delay, and the radix-8 multiplier exhibits the most delay. The hybrid radix-4/radix-8 delay falls between those of the radix-4 and radix-8 multipliers. Note that the delays shown in Table I do not include the effects of interconnect impedances. Table I. Technology dependent delay of multiplier architectures (1.2 ,um, 5 volt CMOS) Radix-4 Hytmid Radix-4/S Radix-8 Partial Pm&@ Gene.ratinn 3.3 m 3.3 ns 9.2 ns &tx&$ bit Reduction 13.9 m 16.3 M 122 m Finat High Speed Addition 9.0 m 9.0 33s 9.0 ns Tntal 26.2 IIS 28.6 m 30.4 IIS Partial Pm&t Gemratinn 3.3 ns 3.3 ns 7.4 ns 32 x 32 bit Reduction 10.4 na 122 ns 8.7 ns Fiat High s peed Additinn 7.9 ns 7.9 m 7.9 us Total 21.6 m 23.4 IIS 24.0 us B. Transistor Count The number of transistors nxp.dred to implement a rout- tiplier architecture can provide a metric by which to judge the relative atea requirements and power dissipation of the different architecttues, assuming that switching probabilities for the transistors are relatively constant across architectures, as is the case in these multipliers. The transistor count for Copyright (C) 1996. All Rights Reserved.
  • 3. the 32 x 32 bit and 64 x 64 bit implementations of each of the three architectures are compared in Table II. The radix- 8 implementations require the fewest transistors, while the radix-4 implementations require the most transistors. The number of transistors required to implement the hybrid radix- 4/radix-8 multipliers falls between those of the radix-4 and radix-8 multipliers. TableII. Transistorcount for eaeh multiplierimplementation Bit Width Radix-4 Hybrid Radix-m Radix-8 32X 32 28,522 25,678 23,542 64x64 108,038 90,210 83,412 C. Power Dissipation The power dissipation of the multipliers has also been analyzed based on a 5 vol~ 1.2 pm CMOS process technol- ogy. The average power dissipation of each circuit operating at 10 MHz is determined from SPICE using the Kang power meter [14]. The power dissipation of each component is averaged over 100 random input vectors. The input control signals that drive the decoder/selector circuitry are weighted such that the probabilities of the control signals within the test set conform to the signal assertion probabilities gener- ated by the encoder, e.g., Cl is twice as likely to be asserted as either COor C2 in a radix~ decoder/selector. The results of these simulations are presented in Table III. Note that the encoder cells in the 64 x 64 bit and 32x 32 bit multipliers are identid, however, the loading differs by a factor of approximately two. This distinction accounts for the differences in encoder power dissipation between the two multiplier configurations. In an n x n bit multiplier, a radix-4 encoder drives n+l decoders, while a radix-8 encoder drives n+2 decoders. ‘@wed buffers [15] have been included between the encoders and decoders to drive this large fanouri and the power dissipation of these buffets has been included in the total power dissipation of the encoder listed in Table III. As with the delay vatues presented in Table I, these power dissipation figures do not account for interconnect impedances. Also note that although the sign generation circuitry is identical for both the radix+ and radix-8 implementations, the power dissipation of this circuit is not identical for Table III. Power dissipation of multiplier eom~onents. 5 V. 1.2 urn CMOS, 10 MHz.. I Multiplier CornPorrents I Power Dksipstion ( 11.w) II r-, Full Adder Cell I 18.5 I Radix-4 Encoder 64x64 bit 236 32X 32 bit 154 Radix-4 Deeoder 15.0 Radix-8 Erteoder 64 X 64 bk 380 32X 32 bit 258 Radix-8 Deeoder 19.9 Radix4 Mm Generation 31.5 Radix-8 Sigrr Generation 29.9 3B Adder 65 bit 2239 33 bit I 1130 Ir Final Hi8h Speed Adder 12$ bit 4403 64 bit 2’202 Table IV. Total hardwme and power dissipation for a 64 x 64 bit multiplier. 5 V, 1.2 pm CMOS, 10 MHz. i Radix-4 I Hybrid Radix-4/S I Radix-tl ( 1r!!noncnt I ‘%klDiQ%oJ ‘Y’%3i2%%nlN’T’wci.o_....r. 1.RitAdder. —-...-—..I -.. 1 I t 1 I __..- Rad-1 Fmmder[ 32 7.55 I 10 2.36 ]–j– 0 I 31.20 650 I 9.751 –l - Rsd-8 Ereoder – 15 5.70 22 836 Rsd-8 Deeoder – – 990 19.70 1452 28.89 llad~ Sign 32 &l. 1.01 10 0.32 – – R,sd-8Sign _ _ &l. 15 0.45 22 0.66 Sign bits 961 1.00 657 0.70 630 0.66 3B Adder – – 1 2.24 1 2.24 Fmat Adder 1 4.40 1 4.40 1 4.40 TorafPOWC.X Dissiition – 99.1 - 85.9 – 81.3 both applications. This disparity between the radix-4 and the radix-8 power dissipation exists because the control signal input CO does not toggle as frequently in a radix-8 implementation as it does in a radix-4 implementation. This disparity in toggling frequency leads to lower dynamic power dissipation in the sign bit generation circuit. As the number of sign extension bits varies for each partial produc~ the power dissipation of the sign generation circuitry (as shown in Table III) accounts for only the loading of the first stage of the tapered buffers which drive the sign extension bits. Since the tapered buffer is customized for the specific loading of each partial product, the power dissipation of these buffers is included in the architecture-specific power dissipation totals presented in Tables IV and V. Finally, note that the power dissipation of the high speed adders are estimations based upon the simulated power dissipation of the addem assuming the 1.2 pm, 5 volt CMOS technology used in these example circuits and upon extrapolations from data presented in [16]. The total power dissipated by each multiplier architec- ture is shown for a 64 x 64 bit multiplier in Table IV and for a 32x 32 bit multiplier in ‘Mble V. For simplicity, hatf adder cells have teen considered as full adder cells and are shown as l-bit adders in Tables IV and V. As described reviously and shown in Tables IV and Table V. Tot#hardware and power dissipationfor a 32 x 32 bit multiplier. 5 V, 1.2 pm CMOS, 10 MHz. Radix-4 Hybrid Radix+S Radix-8 Number POWCZ Number Power Number Powex Cmnpnmt Dissipation of Dissipation of Oimipation In:= (mW) Instances (mW) tnstanees (mW) l-Bit Adder 690 1277 531 9.82 455 8.42 Rad4 Encoder 16 2.46 7 1.08 – - l+ad~ Deco& 52s 7.92 231 3.47 — Rad-8 Enmder – – 6 1.55 11 2.84 Rad-8 fk’xk - – 204 4.06 374 7.44 ltad~ SiSn 16 &l. 0.50 7 0.22 - - Rsd-8 Siin _ _ &l. 6 0.18 11 0.33 Sign bits 22s 0.27 159 0.20 145 0.18 3B Adder - – 1 1.13 1 1.13 Fiat Adder 1 2.20 1 220 1 2.20 Ted pOWCJ Dissbation - 26.1 - 23.9 - 225 Copyright (C) 1996. All Rights Reserved.
  • 4. V, a radix-8 multiplier dissipates less power than a radix-4 multiplier. The hybrid radix4/radix-8 architwture dissipates power at a level betwea that of the radix-4 and radix- 8 multipliers. Thus, the hybrid radix-4/radix-8 multiplier archkcture is a useful architecture for those applications which require low power while operating at speeds greater than that of a full radix-8 multiplier. Radix-8 multiplication is appropriate for those ultra-low power systems in which added delay can be tolerated. D. The Effects of Voltage Scaling on Performance Voltage scaling, reducing the power supply voltage, may be applied to higher speed multipliers to reduce the power dissipation of these circuits, while simultaneously in- creasing delay. The delay of the multipliers is proportional to the power supply, V~~, as shown in (I), where VT represents the transistor threshold voltage, and the power dissipation is proportional to the square of the power supply voltage as shown in (2) [17]. Delay m VDD (VDD - VT)’ (1) Power m (VDD)2 (2) The power dissipation of the radix-4, hybrid radix- 4/radix-8, and radix-8 multipliers after voltage scaling is compared in Thble VI. Note that the scaled voltage levels are referenced to the radix-8 multiplier operating at 5 volts. For shorter bit widths such as exemplified by a 32 x 32 bit multiplier, the delay and power dissipation overhead due to the additional 3B adder and more complex encoding is not outweighed by the reduction in delay and power dissipation associated with the partial product summation. In this case, the simpler radix~ encoded multiplier provides the lowest power dissipation at a given delay. However at higher bit widths, as exemplified by the 64 x 64 bit multiplied, the radix-4 and radix-8 multipliers dissipate approximately equivalent power at a given delay, both of which are greater than the hybrid radix4/radix-8 multiplier. Table VL Comparison of performance of voltage scaled multiplier architectures Hybrid Radix-4 Radix- Radix-8 I 4/8 64 x 64 bit VmJfor 30.4 ns delay 4.53 v 4.79 v 5.00 v Power dissipation 81.3 mW 78.8 mW 81.3 mW 32 X 32 bit VDD for 24.0 ns delay 4.65 V 4.91 v 5.00 v Power dissipation 22.5 mW 23.1 mW 24.0 mW IV. Conclusions As higher bit widths and lower power become impor- tant design issues in multipliers, the opportunity to develop new architectures to meet these requirements arises. A new hybrid radix-4/radix-8 multiplier architecture is presented in this paper that is both low power and high spe@ this ar- chitecture provides a trade-off between the high speed of a radix-4 multiplier architecture and the low power dissipation of a radix-8 multiplier architecture. In this hybrid radix- 4/radix-8 multiplier archhectttre, the performance bottleneck of a radix-8 multiplier (the generation of 3B for the radix-8 partial product generation) is performed in parallel with the reduction of the radix-4 partial products rather than serially, as in a radix-8 multiplier. Thus, the hybrid radix-4/radix- 8 multiplier accomplishes a portion of the partial product reduction while a high speed adder is generating 3B. This strategy minimizes a portion of the delay penalty incurred by the radix-8 multiplier in generating 3B. The hybrid radix- 4/radix-8 multiplier architecture dissipates. 13% less power in a 64 x 64 bit multiplier with only a 9% increase in delay, as compared to a radix-4 implementation. When the supply voltage of the 64 x 64 bit multipliers is scaled such that the radix-4, radix-8, and hybrid radix-4/radix-8 multipliers exhibit the same delay, the hybrid radix-4/radix-8 multiplier dissipates the least power. The hybrid radix+radix-8 ar- chitecture therefore provides a trade-off between high speed and low power for application to those systems which re- quire both high speed and low power signed multiplication. [1] [2] [3] [4] [5] [6] [7] [8] [9] References S. R. Powetl and P. M. Chau, “Estimating Power Dissipation of VLSI Signal Processing Chips: the PFA Technique,” VLSI Signal Processing IV, ch. 24, New York: IEEE Press, 1990. R. Sharrn~ A. D. LopeL J. A. Michej@ S. J. Hillenius, J. M. Ancfrews, and A. J. Studwel~ “A 6.75ns 16x16-bit Multiplier in Single-Level-Metal,” IEEE Journal of Solid-Siate Circuits, Vol. SC-24, No. 4, pp. 922-927, August 1989. K. Yaoo,T. Yaman@ T. Niahi& M. Saito, K. Shimohigsahi, and A. Shimizu, “A 3.8-ns CMOS 16x16-b Multiplier Using Complementary Pass-Transistor Logicfl IEEE Journal of ~&o-State Circuits, Vol. SC-25, No. 2, pp. 388–395, April M. Nagamatau, S. Tmak~ J. Mori, K. Hirano, T. Noguchi, and K. Hatartak& “A 150s 32x32b CMOS Multiplier with an Improved Paratlel Structure,” IEEE Journal of Solid-State Curcuits, Vol. SC-25, No. 2, pp. 494497, April 1990. J. MorL M. Nagarnatau, M. Hirano, S. Tartak~ M. Noda, Y. Yoyoshim& K. Hashimoto, H. Hayashid& and K. Maeguchi, “A 10ns 54x54b Parallel Structured Full Array Multiplier with 0.5pm CMOS Technology? IEEE Journal of Solid-State Cir- cuits, Vol. SC-26, No. 4, pp. 600-606, April 1990. G. Goto, T. Sate, M. Nakajitn& and T. Sukemur~ “A 54x54-b Regularly Structured Tree Multiplier; IEEE Journal of Solid- Slate Circuits, Vol. SC-27, No. 9, pp. 1229-1236, September 1992. F. Lu and H. Ssrnueli, “A 200-MHz CMOS Pipelirted Multiplier-Accumulator using a Quasi-Domino Full-Adder Cett Design: IEEE Journal of Solid-State Circuits, Vol. SC- 28, No. Z pp. 123132, February 1993. A. D. Booth, “A Signed Binary Multiplication Technique,” Quarterly Journal of Mechanics and Applied Mathematics, Vol. 4, No. 2, pp. 236-2450, June 1951. 0. L. MacSorley, “High-Speed Arithmetic in Binary Comput- ersfl Proceedings of the IRE, Vol.49, pp. 67-91, January 1961. [10] H, Sam and A. Gup@ “A Generalized Multibit Recoding of Tko’s Complement Binary Numbers and Its Proof with Appli- cation in Multiplier Implementations,” IEEE Transactions on Computers, Vol. C-39, No. 8, pp. 1006-1015, August 1990. [11] B. Millar, P. E. Madrid, and E. E. Swartzlsnder, Jr., “A Fast Hybrid Multiplier Combining Booth and Wallace/f)adda Al- gorithms? Proceedings of the 35h IEEE Midwest Symposium on Circuits and Systems, pp. 158–165, August 1992. [12] C. S. Wallace, “A Suggestion for a Fast Multiplierfl IEEE Transactwnr on Electronic Computers, Vol. EC-13, pp. 14- 17, February 1964. [13] L. DadW “Some Schemes for Parallel Multipliers: Alta Frequenza, Vol. 34, No. 5, pp. 349-356, May 1965. [14] S. M. Kartg, “Accurate Simulation of Power Dissipation in VLSI Circuits,” lEEEJournai of Solid-State Circuits, Vol. SC- 21, No. 5, pp. 889-891, October 1986. [15] B. S. Cherkauer and E. G. Friedman, “A Unified Design Methodology for CMOS Tapered Buffers,” IEEE Transactions on VLSI Systemr, Vol.VLSI-3, No. 1, pp. 99-111, March 1995. [16] T. K. Callaway and E. E. Swartzlander, Jr., “Estimating the Power Consumption of CMOS Adders; Proceedings of the Ilti IEEE Symposium on Computer Arithmetic, pp. 210-216, June/July 1993. [17] A. P. Chandrakasart, S. Sheng, and R. W. Broderson, “Low- Power CMOS Digital Design,” IEEE Journal of Solid-State Circuits, Vol. SC–27, No. 4. pp. 473483, April 1992. .Copyright (C) 1996. All Rights Reserved.