1. An Efficient VFI-Based NoC Architecture Using
Johnson-Encoded Reconfigurable FIFOs
Amir-Mohammad Rahmani1,2, Pasi Liljeberg2, Juha Plosila2, and Hannu Tenhunen1,2
1
Turku Centre for Computer Science (TUCS), Turku, Finland
2
Computer Systems Lab., Department of Information Technology, University of Turku, Finland
Email: {amir.rahmani, pasi.liljeberg, juha.plosila, hannu.tenhunen}@utu.fi
Abstract— In this paper, a Johnson-encoded Reconfigurable In this paper, we propose a novel Reconfigurable
Synchronous/Bi-Synchronous (RSBS) FIFO is proposed Synchronous/Bi-Synchronous (RSBS) FIFO based on
which can adapt its operation to either synchronous or bi- Johnson-encoded pointers to mitigate the latency and power
synchronous mode. The proposed FIFO which can be used to consumption overhead. In addition, since the NoC
interface modules in Voltage/Frequency Islands (VFI) based partitioned into VFIs requires a method to embed and
Networks-on-chip, is capable of alleviating the excessive exploit these FIFOs, we have developed a controller for
energy consumption and high performance overhead of the switch input channels which can adaptively decide the
conventional bi-synchronous FIFOs. The FIFO is scalable operating mode of the FIFOs.
and synthesizable in synchronous standard cells. In addition, The remainder of this paper is organized as follows.
a technique for mesochronous adaptation of the proposed Section II describes the related work. Section III elaborates
FIFO is presented. Our extensive experiments show
the demands for enhancement of the existing architectures,
significant power and performance improvements compared
while the architecture of the reconfigurable FIFO is
to non-reconfigurable architectures.
presented in detail in Section IV. Section V shows the
experimental results and analyzes the impact of our
I. INTRODUCTION technique on a video conference encoding part as a case
Interconnect links will impose a number of limits to study. Finally, Section VI draws conclusions.
complexity, reliability, and throughput in nanoscale system
design. Network-on-chip (NoC) has been proposed to II. RELATED WORK
mitigate the ever increasing communication complexity of
modern many-core system-on-chip (SoC) designs [1][2]. In There have been many efforts to design low latency
addition, achieving power efficiency has become an asynchronous communication mechanisms between
increasingly difficult challenge, especially in the presence synchronous blocks. Some of them include two flip-flop
of increasing die sizes, high clock frequencies and synchronizers or an asynchronous FIFO using Gray code
variability driven design issues. Globally Asynchronous [9][10], Johnson code [11], or a ring counter [12][13] for
Locally Synchronous [3][4] (GALS) -based NoCs read and write pointers, while others consider stoppable
implemented using a multiple Voltage Frequency Island clocks [3]. Our proposed FIFO is based on Johnson
(VFI) design style have become an attractive alternative to encoding and it has two substantial advantages over the
traditional designs [5]. In fact, VFI-based approaches could proposed FIFO in [11]. Firstly, it devotes reconfigurability
be used for minimizing the system power dissipation under to bi-synchronous FIFOs to prevent their associated power
performance constraints. and latency overhead in such cases that their
Assignment of frequencies and voltages to VFIs can be synchronization parts are not needed. Secondly, in addition
done by using either offline or online methods [6]. Offline to register based implementation using one-hot addressing,
it supports standard memory based implementation
methods can be used when the behavior of an application is
addressed by normal binary code.
very predictable for various input conditions and the worst-
There have been several design efforts to combine the
case behavior is not very different from the average-case
benefits of the GALS-based NoC interconnect mechanism
behavior [7]. However, such an approach is not well-suited
for applications that show large variations in their behavior with VFI-based design style [14][15]. For instance the
for different input conditions. For such systems, online authors of [16], present design methodologies for
methods are more suitable [7][8]. Dynamic Voltage and partitioning an NoC architecture into multiple VFIs and
Frequency Scaling (DVFS) schemes can be used to adapt assigning frequency, supply voltage, and threshold
the system to meet the performance requirements of a voltage levels to each VFI according to given
dynamically changing workload while minimizing power performance constraints at design time. On the other hand,
consumption. there have been many works that propose hardware-based
In order to benefit from the VFI-based scheme, approaches to dynamically change the frequencies and
communication between islands should be carried out by potentially voltages of a VFI system driven by a dynamic
using mixed-timing (bi-synchronous) FIFOs [9] which workload [6][8]. However, due to the high latency and
adapt clock frequency discrepancy; however, due to the power overhead, there is a substantial limitation to freely
overhead in implementing these FIFOs in terms of latency, exploit the bi-synchronous FIFOs. To the best of our
area, and power consumption, the associated design knowledge, there is only one study targeting to propose
complexity increases. reconfigurable FIFOs in which FIFOs can work in two
distinct modes and accordingly devote much more
978-1-4244-8971-8/10$26.00 c 2010 IEEE
2. flexibility to the both dynamic and static voltage/frequency and Mode Selector (which is added to the basic architecture
assigning techniques. In [17], for the best case in which the of input channel module to support RSBS FIFO). The
authors presents a Gray-encoded reconfigurable FIFO, there RSBS IB block is a dual mode FIFO buffer, while the IC
are still performance and power overheads due to existence block of each input channel performs the routing function,
of pointer counters and complexity of fullness/emptiness its IRS block receives x_rd and x_gnt signals and triggers
checking logic. Moreover, the gray-encoded design style the rd signal of the RSBS IB block, and the IFC block
can only support FIFO capacities that are powers of two. implements the logic that performs the translation between
The presented FIFO in this paper, which is an improved the handshake and the FIFO flow control protocol. Each
extension of those reconfigurable FIFOs, is capable of channel includes n bits for data and two bits for packet
circumventing the aforementioned issues. As we will framing: begin-of-packet ((n+2)th bit), and end-of-packet
describe later, these restrictions could be mitigated ((n+1)th bit). IFC, IC, and IRS modules are described in
considerably by utilizing the proposed Johnson-encoded detail in [18].
reconfigurable FIFOs.
III. MOTIVATION AND CONTRIBUTION
A VFI can consist of a single PE or, depending on the
physical or design considerations, may contain a group of
PEs. Each VFI is assumed to have a voltage level above a
certain value Vmin and, since the architecture is globally
asynchronous, locally synchronous [3], each module or
core is assumed to be clocked by local ring oscillator or a
central clock generator controlled by a variable intra-island Figure 1. Input channel module architecture with support of
Reconfigurable Syn/Bi-Syn FIFOs
supply voltage [5]. In such systems, the assignment of
voltage/frequency to each island can be classified into The static voltage/frequency assigning problem for a given
Static and Dynamic Voltage/Frequency Assigning component graph G(V,E) which is characterized by the set of
techniques. nodes represented as V = {1, 2, … , n} and edges represented
In Dynamic Voltage Frequency Assigning (DVFA) as E = {(i,j) | i precedes j} can be stated as [8]:
techniques, usually each individual processing element is a Given a component graph G(V,E) comprised of a set of
locally synchronous module operating with its own clock processes mapped on a set of processing elements (PEs), find
and either being a single VFI or forming a VFI with another the optimal voltage and clock frequency to be assigned
synchronous module. This enables dynamic statically to each PE such that the energy per operation is
voltage/frequency scaling in each synchronous module minimized and rate and/or latency constraints are satisfied.
using a DC-DC voltage regulator and a central or local
variable delay ring oscillator maintaining the clock of the After voltage and frequency assigning for each island at
VFI. There are usually a discrete set of frequency and design time, then the VFIs are formed and for inter-island and
voltage levels (usually 2 to 6 levels) assigned by some intra-island communications, bi-synchronous and synchronous
methods such as forecasting. For the sake of adaptivity, all FIFOs are respectively employed. According to the recent
work in this field [7][8], a VFI-based NoC is generally
cores should benefit from bi-synchronous FIFOs. Let us
partitioned into 2 to 4 islands because if a larger number is
consider a frequent case in which adjacent cores in a NoC selected, the overhead of the bi-synchronous FIFOs will
work at a same frequency level (e.g., they belong to same diminish the power savings gained by VFI architecture.
VFI in current timing window). In this situation, despite Designs using SVFA techniques are advantageous in the
both read and write clock signals of their FIFOs have equal case of a system where an oracle has pre-existing knowledge
frequencies, they are still synchronized by passing through of the number of run time cycles used in each PE for
synchronizer blocks for asserting full and empty signals. processing each sample of the application under consideration.
However, these FIFOs can be informed about their equal Moreover, since all of the SVFA stages are done at design time
read and write clock frequencies. A reconfigurable FIFO and for a specific application, this system is not practical for
being capable of operating in both synchronous and bi- other applications. For instance, such a design as
synchronous modes can cope with this by bypassing and MultiProcessor System-on-Chip (MPSoC) which is typically
switching off the unused components (e.g., synchronizers, designed to be mapped and run multi-purpose applications
code converters) and result to considerable improvement in cannot benefit from the SVFA techniques, while it can be
terms of latency, throughput, and power consumption. logical to exploit these techniques by using the proposed RSBS
In order to highlight the importance of reconfigurable FIFOs and configuring the FIFOs for respective application
FIFOs, we have embedded a simple hardware called Mode after each mapping process.
Selector in the input channel of a RASoC-based NoC In the next section, we present the architecture of the
switch [18]. As can be seen from Figure 1, this module is proposed RSBS FIFO which is based on Johnson-Encoding. It
responsible for recognizing the equality of write (provided can be seen how this simple technique can astonishingly
optimize the overall NoC power consumption as well as
by output channel of the adjacent switch) and read clock
latency and throughput.
frequencies and directing the buffer to operate in
synchronous or bi-synchronous mode.
The input channel module shown in Figure 1 consists of IV. JOHNSON-ENCODED RECONFIGURABLE
five different units: IFC (Input Flow Controller), RSBS IB SYNCHRONOUS/BI-SYNCHRONOUS FIFO
(Reconfigurable Synchronous/Bi-Synchronous Input In this section, we present a reconfigurable FIFO design
Buffer), IC (Input Controller), IRS (Input Read Switch), approach based on Johnson encoding and discuss its
3. benefits over Gray-encoded FIFOs. It should be noted that this issue, we uses Johnson encoding for read and write
this design style is scalable and synthesizable in pointers.
synchronous standard cells. The proposed RSBS FIFO is a Johnson encoding is another code with a Hamming
bi-synchronous FIFO [10] able to interface two distance of 1 between consecutive elements which allows a
synchronous systems with independent clock frequencies. safe synchronization of the pointers. To implement the
For the sake of metastability [19] avoidance and sequence, bits are chained in series as in a shift register, and
synchronization of pointers between two independent clock the loop is closed using an inverter, so that the least
domains, it benefits from two synchronizers used for write significant bit is implemented as the negation of the most
and read pointers. significant bit. To differentiate the FIFO fullness and
As shown in Figure 2, similar to most bi-synchronous emptiness, a parity bit to the binary pointers is added for
FIFOs, five typical modules compose the RSBS FIFO virtually doubling the addressing range of the pointers [20].
architecture: FIFO Memory block, sync_r2w, sync_w2r, This parity method is extensible both to Gray and Johnson
FIFO rptr & empty, and FIFO wptr & full. The FIFO encodings, but for Johnson encoding, because of the
Memory block is a buffer accessed by both the write and twisted-ring sequence, it is simpler. When Johnson
read clock domains. This buffer is most likely an encoding is used, the buffer is empty if write_pointer =
instantiated, synchronous dual-port RAM but other memory read_pointer, and it is full if write_pointer = NOT
styles can also be adapted to function as the FIFO buffer. read_pointer.
The sync_r2w (sync_w2r) module is a synchronizer used to The architecture of the FIFO wptr & full (FIFO rptr &
synchronize the read (write) pointer into the write(read)- empty) block is shown in Figure 3. This module consists of
clock domain in the bi-synchronous mode. The FIFO rptr a Johnson-encoded register to generate the n-bit pointer to
& empty block is completely synchronous to the read-clock be synchronized into the opposite clock domain. In
domain and contains the FIFO read pointer and empty-flag addition, it exploits a Johnson to binary converter and
logic. Similarly, the FIFO wptr & full block is completely another register (Binary register) used to address the FIFO
synchronous to the write-clock domain and contains the memory directly without the need to translate memory
FIFO write pointer and full-flag logic. addresses and also one Full (Empty) Detector block to
In the proposed design style, to provide check fullness (emptiness) of the FIFO.
reconfigurability, we have exploited two multiplexers and
two flip-flops to bypass unused components in the
synchronous mode. For this purpose, we added the Syn/Bi-
Syn_Mode signal indicating the operation mode of the FIFO
(synchronous or bi-synchronous). Before describing the
main function of the RSBS FIFO, let us first focus on the
properties of Johnson encoding [11] for the FIFO read and
write pointers and the internal structure of the FIFO wptr &
full (FIFO rptr & empty) block.
Figure 3. FIFO wptr & full block diagram
The main target of the proposed RSBS FIFO is to
bypass and switch off the unused components and have a
simple synchronous FIFO (without the other components
synchronizers) in the synchronous mode. In the proposed
design style, once the FIFO receives the command to
operate in the synchronous mode via Syn/Bi-Syn Mode
signal, the blocks in Figure 2 highlighted with gray circles
are removed from the FIFO path and switched off. Since in
the synchronous mode it is not necessary to synchronize the
pointers into opposite clock domains, we bypass the
synchronizers to produce the empty and full flags using
unsynchronized Johnson-encoded pointers. As the mode
Figure 2. Reconfigurable Syn/Bi-Syn FIFO architecture
changes to bi-synchronous, the disabled blocks will be
As discussed earlier, Gray code presents some again turned on.
limitations in terms of the implementation complexity. The As a result of bypassing synchronizers in both full and
first reason is that Gray code allows encoding only “power empty detection stages, the FIFO has considerable latency,
of two” ranges, while the FIFO size may be optimal at a throughput, and power improvement when it operates in the
value that is not a power of two. The second limitation is synchronous mode; hence it can be an applicable FIFO
that contrarily to binary encoding with the full-adder architecture to be utilized in DVFA techniques. It should
standard-cell, there is no elementary logical operator to be emphasized that in such NoC systems which benefit
perform an addition in Gray encoding. Hence, the from SVFA techniques, the position of islands are constant.
increment of the pointers needs to be hardwired at the cost These NoC systems are not appropriate to be mapped for
of more area and lower performance. In order to cope with various applications at different times. Therefore, it is
4. desirable to have such synchronous FIFOs for intra-island system as a case study and compared it to a similar system
communication which do not have extra latency and power using conventional bi-synchronous FIFOs.
consumption overhead. As a result, if the area overheads of In the case of latency analysis, as the sender and the
the inactive components are acceptable for the system, receiver have different clock signals, the latency of the
exploiting the proposed RSBS FIFO in SVFA-based FIFO depends on the relation between these two signals.
systems is quite efficient. The latency can be decomposed in two parts: the state
In some cases, it is not possible to exploit a central machine latency and the synchronization latency. As the
clock generator for NoC-based systems (specifically for state-machine is designed using a Moore automaton, its
DVFA-based ones). In these situations, each node has its latency is one clock cycle. In the bi-synchronous mode, s
own clock generator (phase-locked loop) and the FIFO registers compose the synchronizers and the latency is ΔT
architecture should be adapted to interface mesochronous plus one clock cycle, where ΔT is the difference, in time,
clock domains where the sender and the receiver have the between the rising edges of sender and receiver clocks. As
same clock frequency but different phases. To this end, the this difference is between zero and one Clk_read clock
RMBS (Reconfigurable Mesochronous/Bi-Synchronous) cycle, the latency of the RSBS FIFO is between s and s+1
FIFO should be utilized instead of the RSBS one. The Clk_read clock cycles in the bi-synchronous mode.
phase difference can be constant or slowly varying. Obviously, in the synchronous mode, there is no difference
According to [19], metastability can be avoided when the between Clk_read and Clk_write and also there is not any
rising edges of the clock signals are predictable, and the synchronizer, and hence data can be fetched by the receiver
two registers in the synchronizer can be reduced to a single on the next rising/falling edge of Clk_read.
register. In order to evaluate the throughput, the RSBS FIFOs
for each operation mode should be analyzed as function of
the FIFO depth. For this FIFO in the bi-synchronous and
mesochronous modes, as the synchronizers add latency, the
performance of flow control of the FIFO is penalized. In
the case of a deep FIFO, those latencies do not decrease the
FIFO throughput since the buffered data compensate the
latency of the flow control. As the FIFO operates in the
synchronous mode, the minimum FIFO depth required to
provide maximum throughput decreases because there is
no need for synchronizers. Table 1 shows the minimum
FIFO depth for 50% and 100% throughput as a function of
the clocking mode. Note that for the bi-synchronous mode
analysis, the write and read clock frequencies are equal,
otherwise it is not possible to obtain 100% throughput.
Table 1. Minimum FIFO depth in function of the clock relation
Figure 4. Synchronization part of the Reconfigurable Meso/Bi-Syn and required throughput
FIFO architecture Minimum depth for Minimum depth for
Mode
50% throughput 100% throughput
As an example, Figure 4 shows the proposed design Bi-syn. Mode 3 6
which we have modified for correct emptiness and fullness Meso. Mode 2 4
detection in the mesochronous mode. In this case, two Syn. Mode 1 2
registers are added and clocked using a delayed version of
the read/write clock in the mesochronous mode. This delay The area of the FIFOs was computed once synthesized
must be chosen to exchange the data without metastable on CMOS 90nm GPLVT STMicroelectronics standard
situations. The delay can be a programmable delay, or any cells using Synopsys Design Compiler. Different FIFO
other metastability-free solution, as for example the depths are used to illustrate the scalability of the
Chakraborty-Greenstreet [21] architecture which allows the architecture. Table 2 shows the area of the 16 and 32-bit
FIFO to work also on plesiochronous (small difference of Gray-encoded RSBS, Gray-encoded RMBS, Johnson-
frequency) clocks. Likewise, if the write and read clocks are encoded RSBS, and Johnson-encoded RMBS FIFOs as a
out of phase by 90°, 180º, or 270°, no programmable delay function of the FIFO depth.
is needed because, by-construction, the communication is
free of metastability. Although it is not as efficient as the Table 2. Area and overhead comparison between the Johnson-
RSBS FIFO, it still improves the FIFO throughput, and in encoded and Gray-encoded design styles and the baseline design
each mode, if the synchronizers used for the other mode is 4×16 4×32 8×16 8×32
Style
turned off, the unnecessary power consumption is µm2 µm2 µm2 µm2
prevented. Gray-encoded RSBS
3434 5888 6354 11415
FIFO [17]
Gray-encoded RMBS
V. ANALYSIS AND CASE STUDY FIFO [17]
3530 5983 6509 11570
We have simulated the reconfigurable FIFO to Johnson-encoded RSBS
3331 5795 6267 11332
characterize its latency, throughput, area, and power FIFO
consumption. Note that, to observe the power efficiency of Johnson-encoded RMBS
3420 5877 6416 11463
the FIFO, we have employed it in a NoC-based MPEG FIFO
5. We apply the proposed RSBS-FIFO-based switch to the [2] L. Benini, and G. D. Micheli, “Networks on chips: a new SOC
paradigm,” IEEE computer, Vol. 35, No. 1, 2002, pp. 70–78.
NoC-based MPEG-4 decoder described in [22] and
[3] D. M. Chapiro, “Globally asynchronous locally synchronous
compare it with a similar system which does not benefit systems,” Ph.D. dissertation, Dept. Comput. Sci., Stanford
from the reconfigurable FIFOs. The MPEG-4 decoder University, Stanford, CA, 1984.
system is modeled and mapped on a 5×3 NoC. In the [4] J. Muttersbach et al., “Practical design of globally asynchronous
system, each node has a 5×5 crossbar switch. Since MPEG locally synchronous systems,” in Proc. of Int. Symp. on Advanced
videos show a lot of variability in processing time Research in Asynchronous Circuits and Systems, 2000, pp. 52–59.
depending on the type of frame being processed, we [5] D.E. Lackey et al., “Managing power and performance for system-
on-chip designs using volatge islands,” in Proc. of IEEE/ACM Int.
perform prediction-based dynamic voltage/frequency Conf. on Computer Aided Design, 2002, pp. 195-202.
assigning on each node based on the DVFA algorithm [6] P. Choudhary and D. Marculescu, “Power Management of
proposed in [8]. The prediction decision is taken at the start Voltage/Frequency Island-Based Systems Using Hardware-Based
of processing of a new macroblock at each node, and for Methods,” IEEE Transactions on VLSI Systems, Vol. 17, No. 3,
each input channel we add a synchronous/bi-synchronous 2009, pp. 427-438.
mode selector unit. The simulation is performed for three [7] U. Y. Ogras, R. Marculescu, D. Marculescu, and E. G. Jung,
“Design and Management of Voltage-Frequency Island Partitioned
different frequency sets having 2, 4, and 6 frequency Networks-on-Chip,” IEEE Transactions on VLSI Systems, Vol. 17,
levels. We assume that the switches are clocked by a No. 3, 2009, pp. 330-341.
central clock generator block; therefore they do not need [8] K. Niyogi and D. Marculescu, “Speed and voltage selection for gals
the mesochronous adaptation. systems based on voltage/frequency islands,” in Proc. of ACM/IEEE
Asian-South Pacific Design Automation Conf., 2005, pp. 292–297.
Figure 5 shows the average power saving percentage of
[9] T. Chelcea and S. M. Nowick, “Robust interfaces for mixed-timing
the NoC switches achieved by exploiting the Johnson- systems,” IEEE Transactions on VLSI Systems, Vol. 12, No. 8,
encoded RSBS FIFOs instead of the conventional bi- 2004, pp. 857–873.
synchronous FIFOs. The comparison is made between the [10] C. Cummings and P. Alfke, “Simulation and synthesis techniques
RSBS FIFO and its baseline counterpart [10] for three for asynchronous FIFO design with asynchronous pointer
different frequency sets and two different data widths. As comparison,” in SNUG-2002, San Jose, CA, 2002.
the results show, we get around 5.2-17% savings over the [11] Y. Thonnart et al., “Design and Implementation of a GALS Adapter
for ANoC based Architectures,” in Proc. of Int. Symp. on Advanced
baseline architecture. Research in Asynchronous Circuits and Systems, 2009, pp. 13-22.
[12] T. Ono, and M. Greenstreet, “A Modular Synchronizing FIFO for
NoCs,” in Proc. of Int. Symp. on Networks-on-Chip, 2009, pp. 224-
233.
[13] I. Panades and A. Greiner, “Bi-synchronous FIFO for synchronous
circuit communication well suited for network-on-chip in GALS
architectures,” in Proc. of Int. Symp. on Networks-on-Chip, 2007,
pp. 83–94.
[14] J. Quartana, S. Renane, A. Baixas, L. Fesquet, and M. Renaudin,
“GALS systems prototyping using multiclock FPGAs and
asynchronous network-on-chips,” in Proc. of Int. Conf. on Field
Programmable Logic and Applications, 2005, pp. 299–304.
[15] G. Campobello, M. Castano, C. Ciofi, and D. Mangano, “GALS
networks on chip: A new solution for asynchronous delay-
Figure 5. Average power savings for the NoC switches used in insensitive links,” in Proc. of Design, Automation and Test in
MPEG Encoder Europe Conf., 2006, pp. 1–6.
[16] C.-L. Chou et al., “Energy- and Performance-Aware Incremental
Mapping for Networks on Chip With Multiple Voltage Levels,”
I. SUMMARY AND CONCLUSION IEEE Transactions on CAD, Vol. 27, No. 10, 2008, pp. 1866-1879.
In this paper, a Johnson-Encoded Reconfigurable [17] A. -M. Rahmani et al., “Power and Performance Optimization of
Synchronous/Bi-Synchronous FIFO was proposed which Voltage/Frequency Island-Based Networks-on-Chip Using
Reconfigurable Synchronous/Bi-Synchronous FIFOs,” in Proc. of
can operate in either synchronous or bi-synchronous mode. ACM International Conference on Computing Frontiers, 2010, pp.
The FIFO addresses the synchronization power and latency 267-276.
overhead in the case that adjacent switches in the NoC [18] C. A. Zeferino, M. E. Kreutz, and A. A. Susin, “RASoC: A router
system operate in the same clock frequency but suffer from soft-core for networks-on-chip,” in Proc. of Design, Automation and
unnecessary synchronizations. A technique for Test in Europe Conf., 2004, pp. 198–203.
mesochronous adaptation of the FIFOs has been suggested. [19] F. Mu and C. Svensson, “Self-tested self-synchronization circuit for
Our results revealed that compared to a non-reconfigurable mesochronous clocking,” in IEEE Transactions on Circuits and
Systems-II, Vol. 48, No. 2, 2001, pp. 129-140.
system architecture, the Johnson-Encoded RSBS FIFO can
[20] R. Apperson et al., “A Scalable Dual-Clock FIFO for Data
help to achieve considerable savings in average power Transfers Between Arbitrary and Haltable Clock Domains”, in IEEE
consumption of NoC switches and to improve the total Transactions on VLSI Systems, Vol. 15, No. 10, 2007, pp 1125-
average packet latency significantly in the case of a MPEG- 1134.
4 encoder application. [21] A. Chakraborty and M. R. Greenstreet, “Efficient self-timed
interfaces for crossing clock domains,” in Proc. of 9th IEEE Int.
REFERENCES Symp. on Asynchronous Circuits and Systems, 2003, pp. 78-88.
[1] A. Jantsch and H. Tenhunen. Networks on Chip. Kluwer Academic [22] E. B. Van der Tol and E.G.T. Jaspers, “Mapping of MPEG-4
Publishers, 2003. Decoding on a Flexible Architecture Platform,” SPIE 2002, pp. 1-
13.