Contenu connexe
Similaire à Design of 16 bit low power processor using clock gating technique 2-3
Similaire à Design of 16 bit low power processor using clock gating technique 2-3 (20)
Plus de IAEME Publication
Plus de IAEME Publication (20)
Design of 16 bit low power processor using clock gating technique 2-3
- 1. INTERNATIONAL JOURNAL OF ELECTRONICS AND
International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN
COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)
0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 3, October- December (2012), © IAEME
ISSN 0976 – 6464(Print)
ISSN 0976 – 6472(Online)
Volume 3, Issue 3, October- December (2012), pp. 333-340
IJECET
© IAEME: www.iaeme.com/ijecet.asp
Journal Impact Factor (2012): 3.5930 (Calculated by GISI)
©IAEME
www.jifactor.com
DESIGN OF 16 BIT LOW POWER PROCESSOR USING CLOCK
GATING TECHNIQUE
Khaja Mujeebuddin Quadry
Research Scholar, JNTU Ananatapur, A.P., India.
Email: mujeebqd@yahoo.com
Dr. Syed Abdul Sattar
Professor & Dean of Academics,
Royal Institute of Technology & Science, Chevella, R. R. Dist. A. P. India.
Email: syedabdulsattar1965@gmail.com
Dr. K. Soundara Rajan
Professor, Dept of ECE, JNTU Anantapur, A.P.,India.
Email: soundararajan_jntucea@yahoo.com
ABSTRACT
Low power design is gaining importance due to the increasing need of battery
operated portable devices with high computing capability. The reliability of integrated circuit
depends on the heat dissipated in the circuit. The cost of the system also increases with the
cooling systems for heat removal. A large fraction of the power consumed by a synchronous
logic is due to the clock distribution network and the high switching activity at the nodes.
Clock Gating is the well known technique used to reduce the clock power. In this paper we
have presented the design of 16 bit processor using 90nm technology by applying the clock
gating principle at the fine grained level to minimize the power dissipation.
I. INTRODUCTION
Clock gating is a technique used to reduce power dissipation in clock distributed
network. This is achieved by shutting down the clock of any component whenever it is not
being used or accessed. It involves inserting combinational logic along the clock path to
prevent the unnecessary switching of sequential elements. By shutting down the idle units we
can prevent the circuit from consuming unnecessary power. A portion of the clock tree can
also be shut down by masking off the clock at the internal node of the tree using an AND
gate.
333
- 2. International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN
0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 3, October- December (2012), © IAEME
Figure1. Processor Power Breakdown[16]
This prevents wasteful switching in the clock tree and saves power in the clock tree in
addition to saving power in the functional units which are fed by the clock. In modern
processors and SoCs the clock distribution network is responsible for an increasing fraction
of the dynamic power consumption[15]. The Figure 1 shows the breakdown of power
consumption for a recent high-performance microprocessor[16]. The clock power is expected
to increase as the complexity and the operating frequency of the circuits keep growing as a
result of technology scaling [11]. Designing the clock tree has thus become critical not only
for performance, but also for power, and the development of new modeling capabilities and
synthesis techniques that help in controlling the clock tree power effectively is one of the
challenges that EDA engineers currently have to face[13]. Different solutions for minimizing
the power consumed by the clock tree have been investigated in the recent past. In this paper,
we have presented the design of 16 bit processor by applying the clock-gating technique for
power optimization at the gate and RT levels. The rest of this paper is organized as follows.
In Section II we briefly review previous work on minimization of power using clock gating.
Section III provides design of 16 bit processor and how clock gating is applied. Section IV
discusses simulation results. Finally, Section V concludes the manuscript with some final
remarks.
II. PREVIOUS WORK
The problem of minimizing the power dissipation by clock distribution networks has
been addressed by many authors and a brief overview of their work is mentioned below. In
[14] Jaewon Oh et.al presented a zero-skew gated clock routing technique for VLSI circuits.
In which they constructed a clock-tree topology based on the locations and the activation
frequencies of the modules, while the locations of the internal nodes of the clock tree are
determined using a dynamic programming approach. .In [11] Hans Jacobson et al. examined
the power reduction benefits of a couple of newly invented schemes called transparent
pipeline clock-gating and elastic pipeline clock-gating. In their work they have bounded the
practical limits of clock gating efficiency in future microprocessors. In [10] Jochen Preiss et
al. introduced fine-grain clockgating schemes for fused multiply-add-type floating-point units
(FPU). This method based on instruction type, precision and operand values.
334
- 3. International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN
0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 3, October- December (2012), © IAEME
In[9] Donno et al. presented a methodology in which low-power clock trees are obtained
through aggressive exploitation of the clock-gating technology. In[8] M.Kamaraju1, Dr.K.Lal
Kishore presented a power optimized ALU for efficient data path with clock gating technique
and achieved a saving of 33.3% power dissipation. In[7] M.Kamaraju, Dr.K.Lal Kishore
presented a FPGA based power optimized programmable embedded controller with a power
dissipation of 15mw and with a frequency of operation of 15Mhz. In [5] Khaja Mujeebuddin
Quadry and Dr. Syed Abdul Sattar presented FPGA based design of low power 16 bit
processor with a power dissipation of 25mw with operating frequency of 30.931Mhz, and a
saving of 21% is achieved after applying various low power techniques. In [6] N.Sivasankara
reddy presented a low power 16 bit processor with a power dissipation of 1.37mw, and
saving of 29% by using low power techniques. In [4] Samiappa Sakthikumaran1et al.
proposed a 16-bit non-pipelined RISC processor with 329.3 µW power dissipation and total
area of 65012 nm² using 90nm technologuy. In [3] Jagrit Kathuria et al. presented the review
of existing clock gating techniques. In [2] Ali Elkateeb presented a practical introduction to
soft-core processor design through the use of step-by-step integrating of the processor’s
components. In [1] Shmuel Wimer et al. presented a probabilistic model of the clock gating
network that allows to quantify the expected power savings and the implied overhead. They
presented expressions for the power savings in a gated clock tree and derived the optimal
gater fan-out based on flip-flops toggling probabilities and process technology parameters.
The resulting clock gating methodology achieves 10% savings of the total clock tree
switching power. However, the described approaches give little attention to integration issues
with existing design flows.
III. PROCESSOR WITH CLOCK GATING
The Design of 16 bit RISC processor is done and clock gatng is applied to the design.
The processor has 24 basic instructions involving Arithmetic, Logical, data transfer,
Branching, and Control instructions. The processor consist of 16 bit register set R0 to R7,
PC,IR,RegY and Add_reg R. The 16 bit ALU is designed to perform the arithmetic and logic
opeartions. The control unit generates the control signals according to the instruction being
executed. The state machine of the control unit has basically four states idle, fetch, decode
and execute.The processor has two flag bits zero and carry.
Figure 2. Processor Architecture
335
- 4. International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN
0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 3, October- December (2012), © IAEME
The architecture of the processor is shown in Figure 2 it has two buses Bus_1 and Bus_2
which are driven by Mux_1 and Mux_2 respectively. Mux_1 is 8 to 1 multiplexer. Register
R0 to R6, and PC are input to Mux-1, 3 bit select line is used to select any of these registers.
The output of the Mux_1 is driving the Bus_1.The output of Bus_1 is given as input to ALU,
Bus_2,and Memory. Mux_2 is a 4 to 1 mux uses 2 bit selecet line to select ALU output,
Bus_1 and memory word, the output of Mux_2 is driving the Bus_2. The data from the
Bus_2 is loaded in to any one of 16 bit register by using the respective load _x signal from
the control unit. The instruction format of the processor is shown in Figure 3. The source and
destination registers are specified by the 3 bit address. The opcode is of 5 bits hence a total of
32 instructions are possible.
opcode Source Destinati
on
15- 1 1 9 8 7 6 5 4 3 2 1
12 1 0
Figurre 3. Instruction format
There is a provision for increasing the number of instructions and number of registers , as 4
bits are left for future use. In case source or destination is a memory location then an address
of the memory location is mentioned in the second word of the the instruction. The program
counter holds address of the current instruction to be executed. The contents of the program
counter are transferred to address register through Bus-1 and Bus-2.The contents of the
memory pointed by the address register are transferred to the instruction register through
Bus_2 and the program counter is incremented. The instruction is decoded by the control unit
and the control signals are generated by the control unit to perform the operation .Once the
processor is designed, then verified the functionality for all the instructions. The
optimization of the the processor for power dissipation is done by applying the clock gating
technique at fine grained level. In clock gating technique clock is disabled to a circuit to save
power by eliminating power dissipation on clock network by preventing unnecessary activity
in logic modules. In the processor architecture, identified the units for which the clock is to
be gated and the condition for the gating is evaluated separately for each of the module. In the
case of register file only source and destination registers are used in the execution and the
other registers are in idle condition hence the clock is masked with the AND gate by using an
enable signal
Figure 4. Clock gating at fine grained level
336
- 5. International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN
0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 3, October- December (2012), © IAEME
The Figure 4.shows how the various modules are connected to the clock through the masking
AND gate with an enable signal. The condition for activating the enable signal for various
modules is found out by care fully analyzing the functionality and timing diagram of the
processor.
Figure 5. Clock Enable signal for Flip Flop
The Figure 5 shows how a clock enable signal is derived for flip flop, whenever there is a no
change between previous output and present input the clock signal is masked by the clk_en
signal which is computed by perfroming the XOR opeartion between D and Q. Here the cost
we are paying for saving the power is extra logic circuit overhead. The carry and zero flag
registers are implemented with this method.
The group enable signal is generated for 16 bit registers as they consist of group of flip
flops. The instruction based clock gating is also incorporated by carefully partitioning the
CPU registers into blocks and a common clock enable signal is derived to turn off the register
group independently[9].
IV. SIMULATION RESULTS
The Figure 6 shows simulation results from 485ns to 715ns, of the simulation time ,
SUB instruction is executed from 485ns to 525ns (4 cycles), BRZ instruction is executed
from 525ns to 555ns (3 cycles), ADD instruction is executed from 555ns to 595ns (4 cycles),
AND instruction is executed from 595ns to 635ns (4 cycles), OR instruction is executed from
635ns to 675ns (4 cycles), XOR instruction is executed from 675ns to 715ns (4 cycles), The
control signals to execute the above instructions can be seen in the figure.The processor is
tested by executing the number of test programs from the test bench and verified the
functionality.
Figure 6. Simulation results
337
- 6. International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN
0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 3, October- December (2012), © IAEME
Figure 7. Net power usage
Table 1. Power and Area report
16 bit Cells/ Leakage Dynamic Total
Processor Area(nm²) power power (nw) Power(nw)
(nw)
w/o low 1123/ 347302.
170..928 347131.269
power 34378 197
With low 1141/34638 179. 588 281869.759 282049.
power 348
The figure 7 shows the net power usage report generated by cadence Encounter(R) RTL
compiler.The Table 1 shows the number of cells, cell area, leakage and dynamic power
dissipation of the processor with and without applying clock gating technique. We have
observed that 23.15% power saving is achieved after the application of clock gating
technique.
V. CONCLUSION
The 16 bit processor with 90nm technology is designed, simulated, verified the
functionality and Power optimization is done by applying the clock gating technique at fine
grained level. Instruction level clock gating is done by grouping the modules according to
the instructions. The activation functions for enabling and disabling the clock for group of
flip flops is evaluated care fully. The power and area are evaluated before and after the
application of clock gating technique and is shown in Table 1, it is observed that an overall
saving of 23.15% of power is achieved. The absolute power dissipation of the designed
processor is 282049.348 nw, compared to the 16 bit processor designed using 90nm
technology presented in [4] it is less. The frequency of operation of the designed processor is
226MHz. The leakage power dissipation is going to increase as the technology is scaled
down which can be reduced by applying power gating technique in combination of clock
gating technique.
338
- 7. International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN
0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 3, October- December (2012), © IAEME
REFERENCES
[1] Shmuel Wimer, Israel Koren, “The Optimal Fan-Out of Clock Network for Power
Minimization by Adaptive Gating”, IEEE transactions on very large scale integration
(vlsi) systems, vol. 20, no. 10, pp.1772-1780, october 2012.
[2] Ali Elkateeb “A Processor Design Course Project: Creating Soft-Core MIPS Processor
Using Step-by-Step Components Integration Approach“ International Journal of
Information and Education Technology, Vol. 1, No. 5,pp.432-440, December 2011.
[3] Jagrit Kathuria,M.Ayoub khan, Arti noor, “A review of clock gating techniques”,
International Journal of Electronics and Communication Engineering, , Vol. 1, No. 2
pp.106-114,, Aug 2011.
[4] Samiappa Sakthikumaran, S. Salivahanan, V. S. Kanchana Bhaaskaran, “16-Bit RISC
Processor Design for Convolution Application” proceedings of IEEE-International
Conference on Recent Trends in Information Technology, ICRTIT 2011 978-1-4577-
0590-8
[5] Khaja Mujeebuddin Quadry, Dr. Syed Abdul Sattar, Design of 16 bit low power
processor”, (IJCSIS), International Journal of Computer Science and Information
Security Vol. 10, No. 6, pp.67-71 June 2012, ISSN 1947-5500
[6] N.Sivasankara reddy, “minimization of power dissipation in 16 bit processor using low
power techniques” Asian Journal of Applied Sciences 4(6):657-662, 2011 ISSN 1996-
3343.
[7] M.Kamaraju, K.Lal Kishore, A.V.N.Tilak, “ Power Optimized ALU for Efficient
Datapath”, International Journal of Computer Applications (0975 – 8887)Volume 11–
No.11,pp.39-43, December 2010
[8] M.Kamaraju, K.Lal Kishore, A.V.N.Tilak, “Power optimized programmable embedded
Controller”, International Journal of Computer Networks & Communications (IJCNC),
Vol.2, No.4,pp 97-107 July 2010
[9] Monica Donno, Enrico Macii, Luca Mazzoni“ power aware clock-tree planning”
Proceedings of the 2004 international symposium on Physical design Pages 138-147New
York, NY, USA ©2004 ISBN:1-58113-817-2
[10] Jochen Preiss, Maarten Boersma, Silvia Melitta Mueller “Advanced Clockgating
Schemes for Fused-Multiply-Add-Type Floating-Point Units” 19th IEEE International
Symposium on Computer Arithmetic pp.48-56. 2009
[11]Hans Jacobson Pradip Bose Zhigang HuRick Eickemeyer Lee Eisen John Griswell
“Stretching the Limits of Clock-Gating Efficiency in Server-Class Processors”
Proceedings of the 11th Int’l Symposium on High-Performance Computer Architecture
(HPCA-11 2005) 1530-0897/05 © 2005 IEEE
[12] D. Duarte, V. Narayanan, M. J. Irwin, “Impact of Technology Scaling in the Clock
System Power,” IEEE Computer Society Annual Symposium on VLSI, pp. 52-57,
Pittsburgh, PA, April 2002.
[13] D. Duarte, V. Narayanan, M. J. Irwin, “A Clock Power Model to Evaluate Impact of
Architectural and Technology Optimizations,” IEEE Transactions on VLSI Systems,
Vol. 10, No. 6, pp. 844-855, December 2002.
[14]Jaewon Oh and Massoud Pedram “Gated Clock Routing for Low-Power
Microprocessor Design” IEEE transactions on computer-aided design of integrated
circuits and systems, vol. 20, no. 6, pp 715-722, june 2001
[15]T. Mudge, “Power: A First-Class Architectural Design Constraint,” IEEE Computer,
Vol. 34, No. 4, pp. 52-58, April 2001.
339
- 8. International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN
0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 3, October- December (2012), © IAEME
[16]V. Tiwari, D. Singh, S. Rajgopal, G. Mehta, R. Patel, F. Baez, “Reducing Power in High-
Performance Microprocessors,” DAC-35: ACM/IEEE Design Automation Conference,
pp. 732-737, San Francisco, CA, June 1998.
[17] Raj Kumar Tiwari and Santosh Kumar Agrahari, “Low Power Arm Processor Based
Embedded System” International journal of Electronics and Communication Engineering
&Technology (IJECET), Volume3, Issue2, 2012, pp. 369 - 374, Published by IAEME
[18] B.K.V.Prasad, P.Satishkumar, B.Stephencharles, T.Prasad, “Low Power Design Of
Wallance Tree Multiplier” International journal of Electronics and Communication
Engineering &Technology (IJECET), Volume3, Issue3, 2012, pp. 258 - 264, Published
by IAEME
[19] P.Sreenivasulu, Krishnna veni ,Dr. K.Srinivasa Rao and Dr.A.VinayaBabu, “Low Power
Design Techniques Of Cmos Digital Circuits” International journal of Electronics and
Communication Engineering &Technology (IJECET), Volume3, Issue2, 2012, pp. 199 -
208, Published by IAEME
AUTHORS PROFILE
Khaja Mujeebuddin Quadry (Member IEEE), Presently working as Associate Professor
& Head Of Department ECE, RITS, Chevella, Hyderabad, A.P., India. He has obtained
Diploma in Electronics and communication Engineering from state board of Technical
Education and Training, A.P India in 1993, BE Degree in Electroics and Communication
Engineering from Osmania University in 1997, ME Degree in VLSI & Embedded System
Design from Osmania University in 2007. Presently he is Research scholar of JNTUA,
Anantapur, A.P., India. He is a Life member of Institution of Electronics and
Telecommunication Engineers (IETE) India. He has 6 years of Industrial experience and 8
years of Teaching Experience
Dr.Syed Abdul Sattar, is presently working as a Dean of Academics & Professor of ECE
department, RITS, Chevella, Hyderabad. He has completed his B.E. in ECE in 1990 from
Marathwada University Aurangabad, M. Tech. in DSCE from JNTU Hyderabad, in 2002, and
did his first Ph.D. in Computer Science from Golden State University USA, in 2004, and
second Ph.D. in ECE from JNTU Hyderabad, A. P. India in 2007. He is a fellow member of
Institution of Electronics and Telecommunication Engineers India, and Life member of
Indian society for Technical Education. His area of specialization is wireless
communications and image Processing. He has about 21years of experience in teaching and
industry together and recipient of national award as an Engineering Scientist of the year 2006
by NESA New Delhi, India. He has about 73 publications in International and National
Journals and conferences.Presently he is guiding research scholars in ECE and Computer
Science from different Universities. He is a member of Board of studies for a central
university and reviewer/editorial member/chief editor for national and International journals.
Dr. K. Soundara Rajan, obtained his Master’s degree and Ph.D. from IIT, Roorkee. He
has more than 30 years of teaching experience and 12 years of Research experience. He has
guided 10 Phd scholars suceesfully , presently 10 PhD scholars are under his guidance.He has
published more than 79 Publications at National and International level. He is Member of an
International Research Journal of Higher Education, Life member for ISTE, Regional
Coordinator for NAFEN (National Foundation of Indian Engineers, New Delhi.). He is
former Principal and Rector of JNTUA , Presently he is OSD to Vice Chancellor at JNTUA,
Anantapur, A.P., India.
340