94

Latency Reduction of Selected Data Streams in
Network-on-Chips for Adaptive Manycore Systems
Thilo Pionteck, Christoph Osterloh Carsten Albrecht
Institute of Computer Engineering Dr¨ ger Medical GmbH
a
Universit¨ t zu L¨ beck
a u 23558 L¨ beck
u
23538 L¨ beck, Germany
u Germany
Email: {pionteck, osterloh}@iti.uni-luebeck.de Email: Carsten.Albrecht@draeger.com

Abstract—This paper reviews Network-on-Chip architectures In case that the number of hops cannot be reduced, a
with prioritization of selected data streams targeting runtime communication latency reduction can be achieved by reducing
reconfigurable manycore systems. The common idea of these the latency of individual routers. Appropriate techniques are
architectures is to minimize the latency of selected packet
transmissions by either bypassing or parallelizing processing speculative execution of router pipeline stages in parallel [3],
stages in routers or by using dedicated links bypassing complete [4] and by pre-computing routing decision using look-ahead
routers. Potential classes of selected data streams are latency schemes [5], [6], [7]. End-to-end latency can also be reduced
critical messages, i.e. cache accesses in multiprocessor systems, by using adaptive routing schemes, allowing to bypass nodes
or systems with semi-static data streams, i.e. systems in which the with high congestion. The work presented in [8] describes
same components continuously exchange data for a longer period.
The review categorizes the diverse architectures and evaluates such a NoC in combination with the ability to bypass the
their pros and cons in terms of universality, hardware efficiency router pipeline. Common disadvantages of these approaches
and support of changing traffic patterns. are their increased hardware effort and a latency unsuitable
for latency-critical messages. Based on the observation that
I. I NTRODUCTION only a certain amount of messages are latency-critical or
With the emerge of manycore systems and the increased show semi-static characteristics, it is favorable to prioritize
need for scalable global on-chip communication architectures, these kind of data only. In [9], the composition and amount
Network-on-Chips (NoCs) are becoming the dominant com- of latency-critical messages in shared-memory chip multi-
munication architecture for complex system designs. Com- processor (CMP) systems are analyzed. The authors identify
pared to shared buses and point-to-point connections, NoCs protocol requests, acknowledgment packets and critical word
feature high scalability, high throughput and cost efficiency in packets in read and write transactions as the main categories
terms of area and power [1]. The main drawback of NoCs is of latency-critical messages. It is also shown that data traffic
that packets have to pass several routers along their path. At exhibits a strong temporal and spatial locality and accounts for
each router, packets compete for router resources while going about 18% of the overall traffic. Examples for semi-static data
through a complex processing pipeline [2]. Depending on the streams are given in [10]. Here, implementations of wireless
number of hops this results in a significant communication communication standards are analyzed with regards to their
latency, limiting the overall system performance. In case of a inter-module communication characteristics. The authors show
large number of consecutive packets following the same path, that for long periods subsequent data items of a stream follow
some internal router processing steps are even not required, as, the same route and have periodic behavior.
for example, routing decision will be the same for all packets. The aim of this paper is to provide an overview on diverse
If these processing steps are left out, latency as well as power approaches for prioritizing latency-critical or semi-static data
consumption can be reduced. streams in NoC-based communication architectures. A focus
A way to minimize the number of hops is to customize the is set on NoC architectures suitable for runtime reconfigurable
network topology according to the communication characteris- manycore systems. Irregular NoC architectures such as [11],
tics of the processing elements for a given application scenario. [12] are not considered as well as NoCs based on non-mesh
Yet, this contradicts the idea of an universal communication topologies such as [13]. The former are restricted to certain
architecture, increases the design effort, and is not applicable communication patterns while the latter ask for high radix
for systems with a wide range of application scenarios. This routers, significantly increasing the area footprint.
approach also implies that type and location of processing Criteria for categorizing NoC architectures with latency
cores are fixed during lifetime of a system. For runtime recon- reduction techniques are given in section II. Among these
figurable systems where processing cores can be exchanged at criteria, the effect of a prioritization is chosen for categorizing
runtime, this assumption cannot be kept valid. Such systems the NoC architectures: per end-to-end connection (section III),
show changing communication patterns during system lifetime per router (section IV), or per path segment (section V). A
and ask for regular communication architectures. discussion of pros and cons of the architectures with regard to

978-1-4244-8971-8/10$26.00 c 2010 IEEE

Prioritization Switching technique Decision making Link type
the requirements of runtime reconfigurable manycore systems packet-switched circuit-switched speculative deterministic physical logical
Examples
is given in section VI.
[14]

II. D ESIGN S PACE core-to-core [15]
[16]
There is a huge design space for latency reduction tech-
niques within NoCs. Focusing on NoCs suitable for adap- [17]
tive manycore systems and applying the design constraints per router [18]
discussed in the previous section, four features mainly dis- [19]

tinguishing architecures are identified: effect of prioritization,
switching technique, kind of decision making and link type. [2]

Effect of prioritization: The aim of all presented NoCs per [20]
path
is to minimize the core-to-core latency for packets. Yet, segment [21]

the architectures differ in the effect of their prioritization [22]

efforts. Prioritization can be applied once for a core-to-core
connection, per router or per path segment. When prioritization Fig. 1. NoC prioritization techniques design space
is done on basis of core-to-core connections, prioritization is
done only once at packets entry point to the communication according to the effect of prioritization. In addition, figure 1
network. This prioritization will sustain till the packet reaches shows the parameters of the individual NoCs presented in the
its final destination. Applying this technique, packets will next sections.
normally bypass the standard NoC infrastructure which results III. C ORE - TO -C ORE P RIORITIZATION
in the highest potential for latency reduction. This is in contrast This section comprises NoC architectures featuring core-to-
to the prioritization on per-hop basis. Prioritization has to be core optimization. For the fast path, they all provide direct
redetermined at every hop, causing time overhead and, thus, connections from source to sink, either by using a bus [14],
reduces the achievable latency reduction. In between these by providing dedicated long-range links [15] or by forming
two extremes is the prioritization on per path segment basis. logical topology on top of the physical topology [16]. As the
Here, packets travel along prioritized path segments. These first two NoC designs provide additional physical connectivity
path segments may not cover the complete route from source they are sub-categorized as heterogeneous NoC, while the
to sink. Prioritization decisions have to be done for each path latter one is sub-categorized as a logical one.
segment individually. As these three prioritization methods are
mutually exclusive, they are chosen for categorizing the NoC A. Heterogeneous NoCs
architectures reviewed in this paper. The main characteristic of both architectures presented here
Switching technique: For transferring prioritized packets, is that they employ a standard regular grip-shaped NoC for
either packet switching or circuit switching techniques are most of the traffic. Long distance or latency sensitive traffic
applied. Packet switching in this context means that a packet is bypassed using the additional communication infrastructure.
travels along several routers from source to destination. At An architectural view on both NoCs is given in figure 2.
each router, routing decisions and arbitration of network re- A combination of a NoC with a shared bus is proposed
sources have to be done. This is in contrast to circuit switching. in [14]. The bus enhanced NoC (BENoC), as shown in
Here, a dedicated path from source to destination exists. Note figure 2(a), is composed of a packet switched grid-shaped
that within this work, even for circuit-switched transmissions NoC and a low latency, low bandwidth bus. The bus is used
the data stream is divided into packets. In general, this is for for global, latency-critical control signals, provides broadcast
comparability reasons with the non-prioritized traffic. as well as multicast capabilities, and it can also be used
Decision making: This criteria determines whether rout- for the configuration and management of the NoC. High
ing for the prioritized packets is made deterministically or throughput data communication between cores is handled by
speculatively. The former means that the routing decision is the NoC. Performance evaluation is performed by means of
guaranteed to be correct, which is not the case for the latter. a dynamic non-uniform cache access architecture (DNUCA)
Speculative forwarding also inheres the danger of generating multi-processor system consisting of 16 processors and 64
dead flits, i.e. flits which are sent on a wrong link and which L2 cache tiles. The authors show that BENoC facilitates an
have to be deleted. average system speedup of about 300% for several benchmarks
Link type: The fast path to be used by prioritized packets compared to a pure NoC-based communication infrastructure.
can either be physical or logical. A logical fast path is mapped For the test system, the area requirement of the bus is less than
on top of the physical network. The packets follow the same 0.1% of the die area for a 0.18µm process. Area numbers for
route as non-prioritized packets, only parts or the router the NoC are not given.
are bypassed. Physical fast path are dedicated connections Another approach is followed by [15]. Here, long-range
implemented in silicon. links are inserted on top of a regular mesh network (see
Typical parameter combinations of latency-optimized NoCs figure 2(b)). The long-range links consist of segments of fixed-
are presented in figure 1. Primary classification has been done length links connected by repeaters with buffering capabilities.

Determination of number, length, and location of the long- in table I. Note that latency values are given per router while
range links is done at design time for a given application. overhead values are given for complete test systems.
The long-range links are used for any kind of traffic as long
as the long-range links provide a shorter path and do not A. Prioritized Access to Router Resources
cause deadlocks. The area overhead caused by the long-range A dynamic path management scheme on router level is
links is about 10% for a 4 × 4 mesh with 4 long-range links proposed in [17]. The idea is that flits arriving on a frequently
using a Xilinx Virtex-II FPGA as hardware platform. Overall used input/output link combination are prioritized against other
energy consumption increases by about 1%. For performance traffic during switch allocation. The input/output pair forms
evaluation, the critical traffic workload, the number of injected a fast path and a virtual channel (VC) is dedicated for the
packets per cycle at which packet delivery rate rises abruptly, is fast flits. Fast paths are determined locally by collecting
considered. Results for a 4 × 4 auto industry benchmark and a statistics of intra-router transfer patterns. The router pipeline
5×5 telecom benchmark show that the critical traffic workload for flits travelling along a fast path can be reduced further
is increased by 13.6% and 36.3%, respectively. The application by sending the switch arbitration request to the next node
specific insertions of long-range links at design time hampers before the flit actually enters the node. Therefor, the switch
the usage of this NoC architecture for systems with unforesee- allocation request is send to the next node while the flits
able communication patterns, i.e. adaptive manycore system. traverses the crossbar. Performance is evaluated using a CMP
Though the underlying communication architecture is still a architecture and executing applications from the SPLASH
regular mesh NoC, this work is considered within this paper. benchmark. Network latency is reduced by up to 30% and
power consumption is reduced by about 2.5% on the cost of
an area increase of 1.34%.
Processing
core
NoC switch
B. Speculative Forwarding
Processing Repeater
Another method to reduce the router pipeline latency is
core
NoC switch presented in [18]. For each idle input link the output link
NoC switch with extra port
NoC link
being used by the next packet transfer is predicted and switch
NoC link
Bus
Long-range
link
arbitration is speculatively completed. If the prediction hits,
routing computation and switch arbitration stages of the router
(a) BNoC [14] (b) Long-range links [15] pipeline are bypassed and switch traversal is completed in a
Fig. 2. Heterogenous NoC architectures
single cycle. Otherwise, packets are transferred to the orig-
inal router pipeline without any additional latency overhead.
B. Logical Topology Wrongly forwarded flits are masked in the output channel. For
A NoC design with a reconfigurable, circuit-switched logi- prediction, several adaptive and static schemes were proposed
cal topology called ReNoC( Reonfigurable NoC) is presented and analyzed according their hit rate. In order to facilitate
in [16]. Conventional NoC routers are wrapped by topology a better adaption to different traffic characteristics, it is also
switches which form a configurable layer between routers possible to implement several prediction schemes for one input
and links. These topology switches can either be configured link in parallel. The optimal prediction scheme is selected by
to connect a link to a router port or to directly connect choosing the one with the highest hit rate over a given time
two links with each other bypassing the router. Thus, it is period. In three case studies, the latency is reduced in the range
possible to form logical long links between two cores, two of 30.7% − 48.2% on cost of an increase in area and power
routers, or between a processing core and a router. The in range of 6.4% − 15.9% and 8.0% − 10.5%, respectively.
logical links are created on top of a static NoC topology A combination of speculative forwarding and setting up of
and form a logical topology with a combination of circuit preferred paths is presented in [19]. The authors adapt the
switched and packet switched elements. Configuration of the ”mad-postman” technique and speculatively forward flits to a
topology switches is done according to the communication pre-configured output bypassing the router logic. Technically,
needs of the actual running application. Details about the this is realized by connecting all inputs directly with each
configuration process are not given. For a video object decoder output via tri-state buffers. If an output has a preferred
application, the authors show that ReNoC facilitates a decrease input, the corresponding tri-state buffer is preselected and all
in power consumption of about 56% compared to a static mesh flits arriving at that input are forwarded. In order to detect
topology. The topology switches lead to an area increase of mistakenly forwarded flits, incoming flits are also analyzed
about 10%. whether the last forwarding led the flit closer to its destination.
If this is not the case, the flit is stored in a FIFO and transferred
IV. P ER ROUTER P RIORITIZATION using the standard routing functionality. Mistakenly forwarded
Prioritization on per-hop basis is done by providing pri- flits will be identified as dead flits at the first router not being
oritized access to router resources [17] or by speculative part of the preferred path and are deleted. Preferred path are set
forwarding and execution of router pipeline stages [18], [19]. up by single-flit packets and can be changed at runtime. The
A summary of the main characteristics of these NoCs is given preferred path latency is a function of the number of hops, of

TABLE I
M AIN C HARACTERISTICS OF N O C S WITH P ER H OP P RIORITIZATION
No load latency Overhead Path Speculative Dead
standard prioritized power area determination pipeline flits
[17] 2 clock cycles 1 clock cycle −2.5% 1.34% automatic yes no
[18] 3 clock cycles 1 clock cycle 8.0% − 10.5% 6.4% − 15.9% automatic yes yes
[19] 1 clock cycle delay of tri-state buffer not given 13% manual no yes

the delay of the tri-state buffers and of the links. Area overhead mechanism is extended to allow a flexible binding of EVCs
is given with about 13% for a whole test chip. of arbitrary length to a node and to a more advanced buffer
signaling. Compared to the original EVC design, an additional
V. P RIORITIZATION PER PATH S EGMENT 44% improvement in latency under heavy load and a reduction
NoC architectures prioritizing packets along path segments of power up to 8.2% is achieved.
spanning several hops are presented in this section. This kind Another approach for setting up direct virtual links at
of prioritization is normally done by selecting dedicated VCs runtime is given in [21]. Based on the current NoC state,
as realized in NoC designs by [2], [20], [21]. Along the path virtual point-to-point (VIP) paths are created, allowing packets
segments, the prioritized packets bypass router pipeline stages to bypass the pipeline of intermediate routers. Packets travel
which results in a reduced latency. A combination of path along VIPs by using a dedicated VC, for which each router is
segment prioritization and virtual, circuit-switched network pre-configured to forward the packet to a designated output.
topology is presented in [22]. This network creates paths Each router port can be used by at most one VIP connection.
for prioritized packets which may lead from core-to-core. In combination with prioritizing VIP packets over normal NoC
Thus, this NoC could have been categorized as a core-to-core traffic, VIP connections cannot attain busy channels along their
priorization architecture. Yet, as the length of the prioritized path. VIPs are set up using a simple and small bit-width setup
path is not guaranteed, it is categorized as a per path segment network controlled by a root node. This network is also used
optimizing NoC. A summary of the main characteristics of all to collect the monitoring data of each router. Periodically, the
NoC architectures from this section is given in table II. root node checks whether an adaptation of the VIP paths is
required and manages the tearing down of old and setting up
A. Virtual Links assigned to VCs of new VIPs. Evaluation is done using a multicore SoC with
The concept of Express Virtual Channels (EVCs) is pre- different benchmarks running on the same cores. Results show
sented in [2]. At any router port the set of VCs is partitioned an average latency reduction of 44% and a power reduction
between normal VCs (NVCs) and EVCs. EVCs provide vir- of 17% compared to conventional NoC.
tual express lanes in the network which are used to bypass
intermediate routers by skipping the router pipeline. EVCs B. Spatial Division Multiplexing
are restricted to connect routers only along a single dimension A combination of a packet switched NoC and a circuit
and are not allowed to turn. Focusing on dynamic EVCs, each switched NoC is proposed in [22]. Using spatial-division
router can act as a source/sink of EVCs or as a bypass node. multiplexing, network resources are split between a packet-
The length of each EVC can be configured in advance, allow- switching sub-network (Pnet) and a circuit-switched sub-
ing dynamic adaptation to different traffic patterns. Packets network (Cnet). Configuration of the Cnet is done by a light-
normally try to acquire the longest possible EVC along their weight setup-network called Snet. Processing of flits arriving
pass. In case of high contention of a particular EVC, smaller at the Pnet is done in the same way as for standard packet-
EVCs can be chosen. If all possible EVCs are occupied, NVCs switched NoCs. The only difference is during the routing
are used. While virtual express lanes are mapped on top of a computing stage. If the Cnet part of the physical output link
regular mesh topology and do not require extra wires, extra is free, the flit is moved to the Cnet. The Snet is used to
control lines are needed for flow control between sinks and build the longest possible direct link to the destination node.
sources of individual EVCs. For the SPLASH benchmark, the Flits traveling along the Cnet bypass the router pipeline and
authors show a latency reduction of 84%, a power reduction are sent in a pipelined fashion to the destination node. At
of 38% and a throughput improvement of 23%. the destination node, the flits are either transfered to the local
An improved version of EVCs [2] based on a hybrid core or are handled in the same way as flits arriving at the
interconnect called NOCHI is given in [20]. The EVC network Pnet. For synthetic traffic patterns, the authors showed latency
is supplemented by a control plane comprised of global lines and power reduction of 45% and 22%, respectively. An area
spanning all nodes in a row or column. The global lines overhead of less than 10% is mainly caused by the Snet.
are used for exchanging broadcast control information and
flow control messages and replace the dedicated point-to-point VI. D ISCUSSION
control wires required in the initial EVCs design. They base For runtime reconfigurable manycore systems, not only the
on capacitive feed-forward circuits and are extended for one standard NoC design parameters such as throughput, latency,
cycle multi-broadcast abilities and collision detection with power and area requirements are relevant. Key properties
node quantity determination. The original EVC flow control having high impact on these parameters are the adaptability

TABLE II
M AIN C HARACTERISTICS OF N O C S WITH P RIORITIZATION OF PATH S EGMENTS
No load latency Power Area Type of virtual Configuration
standard prioritized reduction overhead connection
[2] 4 clock cycles 2 clock cycles 38% conrol lines arbitrary nodes in one dimension design time
[20] 4 clock cycles 2 clock cycles 44% control network arbitrary nodes in one dimension runtime
[21] 5 clock cycles 2 clock cycles 18% 2% core to core runtime
[22] 5 clock cycles wire delay 22% < 10% core/node to node/core runtime

to diverse traffic patterns, the selective prioritization of certain wire delay between nodes [14], [16], or to one clock cycle per
data streams, and implementation issues. Table III summarizes hop [15]. In combination with reduced latency, bypassing of
these relevant parameters for the NoC designs presented. routers also leads to energy savings. As the additional physical
Concerning implementation issues, NoC designs for runtime communication links exist in parallel to the standard NoC,
reconfigurable manycore systems are often restricted to the system throughput is increased, too. Another advantage of this
hardware structure of and design tool limitations for Xilinx NoC category is that latency sensitive traffic is guaranteed
Virtex FPGAs. Apart from few designs implemented on ASIC- to be prioritized. There is no speculation involved as in
style runtime reconfigurable platforms, these FPGAs form the the link prioritization architectures presented in subsection
basis for most runtime reconfigurable systems. Thus, NoC B of section IV. When focusing on runtime reconfigurable
designs have to deal with their restrictions and limitations. manycore architectures, a drawback of this NoC category
Column technology requirements of table III lists the hardware is their limited flexibility. Apart from [16], they are either
requirement of NoC architectures [19] and [20] which hamper limited by the message length to be transmitted along the
a direct realization on the Virtex FPGA platform. Implemen- additional infrastructure or by node locations. [14] is optimized
tation issues that complicate but do not prevent an FPGA to transfer short control messages along the additional network
realization are given in column layout anomalies of table III. and, thus, is not appropriate for data intensive semi-static data
These restrictions are caused by the fact that for runtime streams. Manual insertion of long links at design time restricts
reconfigurable FPGA designs, a homogeneous and regular placement of runtime exchangeable processing cores to certain
system layout is desirable. Even though it is possible, routing locations in case they want to make use of these links [15].
signals through regions to be reconfigured is not advisable. Concerning adaptivity to changing traffic patterns, NoC
As a result, additional, non-uniform control wires between architectures prioritizing on per hop basis or per segment basis
nodes [2], physical fast paths [14], [15] or additional control provide a flexible option. The only exception is [2], where
networks [20], [21], [22] hamper a smooth design flow. the architecture requires dedicated control lines for connecting
Another important design feature for runtime reconfigurable source and sink of virtual links. Thus, virtual connections have
manycore system is the ability to prioritize selected data to be defined at design time which reduces system flexibility
streams. Column prioritization of packets of table III specifies in the same way as the manual insertion of long links for NoC
whether prioritization is selectable or fixed for all packets. architectures [15]. Yet, the extended version of [2] presented
As pointed out in the introduction, a universal prioritization in [20] circumvents this limitation. With the exception of [19],
neglects the fact that often only a small amount of data is prioritization on per router basis neither requires any dedicated
latency-sensitive [9]. NoCs that prioritize all packets along a control lines nor additional physical bypasses. As a result,
route are well suited for semi-static data streams, yet small these architectures are universal applicable. Yet, they suffer
control messages might even be delayed in the case they do from reduced optimization potential. Flits always have to pass
not follow the main route. This is especially true for NoC at least some stages of the router pipeline at each hop which
architectures with speculative forwarding such as [17], [18]. limits the achievable latency reduction. In order to reduce
Whether or not a NoC guarantees to prioritize a selected pipeline depth and, thus, the latency as much as possible,
data stream is given in column guaranteed prioritization of often complex speculative pipeline structures are used [17],
table III. [18]. This comes on costs of area as well as power efficiency.
With regard to latency, all NoC architectures achieve sig- Concerning energy efficiency, the approaches of [17], [18]
nificant improvements. Yet, they feature significant differences are problematic. Both designs speculatively forwards flits and
in energy consumption and area overhead. In general, NoCs have to check afterwards whether this was correct or not. This
providing core-to-core prioritization tend to show the highest increases switching activity and in case of [18] may also lead
area increase compared to standard mesh-topology NoC de- to congestion. In addition, speculation failure rate becomes
signs. This is a result of the additional physical communication high in case of increasing network traffic [2]. For runtime
structure such as a bus [14] or long-range links [15], or due to reconfigurable systems, the self-adaptive approach of [17],
the additional logic for setting up virtual topologies [16]. The [18] is favorable. These architectures do not require any con-
main advantage of these architecture is their near optimal com- figuration for prioritizing frequently chosen connections. Yet,
munication latency for prioritized data streams. The additional the required configuration of routers in [19] has the advantages
communication infrastructure bypasses the router pipeline at that a settling phase is avoided and that the communication
each hop and reduces the communication delay to either the network can better be adapted to latency-critical data flows.

TABLE III
C HARACTERISTICS OF PRESENTED N O C ARCHITECTURES FOR RUNTIME RECONFIGURABLE MANYCORE DESIGNS
Kind of Prioritization Guaranteed Technology Layout anomalies Remarks
prioritization of packets priotitization requirements
[14] additional bus selectable yes - bus in tree topology bus can handle low-bandwidth only
[15] physical long links destination yes - long-links disturb flexibility restricted
dependent regularity by fixed long links
[16] circuit-switched logical links fixed yes - - configurable virtual topology
[17] reduced pipeline fixed no - - -
[18] reduced pipeline + fixed no - - -
speculative forwarding
[19] reduced pipeline + fixed no tri-state buffers - configurable paths
speculative forwarding
[2] virtual express channels fixed no - extra control lines Virtual express channels
in one dimension only
[20] virtual express channels fixed no capacitive feed- control network Virtual express channels
forward circuits in one dimension only
[21] virtual point-to-point links selectable yes - setup network -
[22] circuit-switched logical links fixed yes - setup network -

The VC-based NoCs with prioritization per path segment [7] L. Xin and C.-s. Choy, “A Low-latency NoC Router with Lookahead
summarized in subsection A of section V suffer from the Bypass,” in IEEE Int. Symp. pn Circuits and Systems (ISCAS), 2010, pp.
3981–3984.
same drawback as NoCs prioritizing individual input/output [8] A. Kumar, L.-S. Peh, and N. Jha, “Token Flow Control,” in 41st
connections per router: flits have to pass at least some router IEEE/ACM Int. Symp. on Microarchitecture (MICRO-41), 2008, pp.
pipeline stages, lowering the achievable latency reduction. 342–353.
[9] Z. Li, J. Wu, L. Shang, R. Dick, and Y. Sun, “Latency Criticality Aware
The approach of [2] and its extension in [20] also limits On-Chip Communication,” in Design, Automation & Test in Europe
virtual links to one dimension. In case source and sink are Conference (DATE), 2009, pp. 1052–1057.
not located in the same dimension, flits have to pass at least [10] P. T. Wolkotte, G. J. Smit, Rauwerda, and L. T. Smit, “An Energy-
Efficient Reconfigurable Circuit-Switched Network-on-Chip,” in 19th
three times the full router pipeline. In contrast, [21] allows Int. Parallel and Distributed Processing Symp., 2005, pp. 155a–155a.
setting up virtual connections between cores or routers directly. [11] J. Chan and S. Parameswaran, “NoCOUT: NoC Topology Generation
With regard to virtual connections, the approach of [16] is with Mixed Packet-switched and Point-to-Point Networks,” in Asia and
South Pacific Design Automation Conference, 2008, pp. 265–270.
the most flexible one, as a complete virtual topology can [12] B. Grot, J. Hestness, S. Keckler, and O. Mutlu, “Express Cube Topolo-
be generated on top of the physical network. While still gies for on-Chip Interconnects,” in IEEE 15th Int. Symp. on High
be configurable, this architecture enables the design of an Performance Computer Architecture (HPCA), 2009, pp. 163–174.
[13] J. Kim, J. Balfour, and W. Dally, “Flattened Butterfly Topology for On-
application specific infrastructure and, thus, is well suited for Chip Networks,” in 40th IEEE/ACM Int. Symp. on Microarchitecture
runtime reconfigurable systems. The only drawback of this (MICRO), 2007, pp. 172–182.
approach is that configuration affects an entire link. In case a [14] R. Manevich, I. Walter, I. Cidon, and A. Kolodny, “Best of Both Worlds:
A Bus Enhanced NoC (BENoC),” in 3rd ACM/IEEE Int. Symp. on
core sends data to two different destinations a direct point-to- Networks-on-Chip (NoCS), 2009, pp. 173–182.
point connection cannot be set up. [15] U. Ogras and R. Marculescu, “Application-Specific Network-on-Chip
Architecture Customization via Long-Range Link Insertion,” in Int.
VII. ACKNOWLEDGEMENT Conf. on Computer-Aided Design (ICCAD), 2005, pp. 246–253.
[16] M. Stensgaard and J. Sparso, “ReNoC: A Network-on-Chip Architecture
This work was funded in part by the German Research with Reconfigurable Topology,” in Second ACM/IEEE Inter. Symp. on
Foundation (DFG) within priority programme 1148 under Networks-on-Chip (NoCS), 2008, pp. 55–64.
[17] D. Park, R. Das, C. Nicopoulos, J. Kim, N. Vijaykrishnan, R. Iyer, and
grant reference Ma 1412/5. C. Das, “Design of a Dynamic Priority-Based Fast Path Architecture
for On-Chip Interconnects,” in 15th IEEE Symp. on High-Performance
R EFERENCES Interconnects (HOTI), 2007, pp. 15–20.
[1] Bolotin, Evgeny and Cidon, Israel and Ginosar, Ran and Kolodny, [18] H. Matsutani, M. Koibuchi, H. Amano, and T. Yoshinaga, “Prediction
Avinoam, “Cost considerations in network on chip,” Integration, the router: Yet another low latency on-chip router architecture,” in IEEE
VLSI Jounal, vol. 38, no. 1, pp. 19–42, 2004. 15th Int. Symp. on High Performance Computer Architecture (HPCA),
[2] Kumar, Amit and Peh, Li-Shiuan and Kundu, Partha and Jha, Niraj K., 2009, pp. 367–378.
“Express Virtual Channels: Towards the Ideal Interconnection Fabric,” [19] G. Michelogiannakis, D. Pnevmatikatos, and M. Katevenis, “Approach-
in 4th Int. Symp. on Computer Architecture, 2007, pp. 150–161. ing Ideal NoC Latency with Pre-Configured Routes,” in First Int. Symp.
[3] R. Mullins, A. West, and S. Moore, “Low-Latency Virtual-Channel on Networks-on-Chip (NOCS), 2007, pp. 153–162.
Routers for On-Chip Networks,” in 31st Int.Symp. on Computer Ar- [20] T. Krishna, A. Kumar, P. Chiang, M. Erez, and L.-S. Peh, “NoC with
chitecture, 2004, pp. 188–197. Near-Ideal Express Virtual Channels Using Global-Line Communica-
[4] L.-S. Peh and W. Dally, “A delay model and speculative architecture tion,” in 16th IEEE Symp. on High Performance Interconnects (HOTI),
for pipelined routers,” in 7th Int. Symp. on High-Performance Computer 2008, pp. 11–20.
Architecture (HPCA), 2001, pp. 255–266. [21] M. Modarressi, A. Tavakkol, and H. Sarbazi-Azad, “Virtual Point-to-
[5] K. Kim, S.-J. Lee, K. Lee, and H.-J. Yoo, “An Arbitration Look-Ahead Point Connections for NoCs,” IEEE Trans. on Computer-Aided Design
Scheme for Reducing End-to-End Latency in Networks on Chip,” in of Integrated Circuits and Systems, vol. 29, no. 6, pp. 855–868, 2010.
IEEE Int. Symp. on Circuits and Systems (ISCAS), 2005, pp. 2357–2360. [22] M. Modarressi, H. Sarbazi-Azad, and M. Arjomand, “A Hybrid Packet-
[6] A. Kodi, A. Louri, and J. Wang, “Design of energy-efficient channel Circuit Switched on-Chip Network Based on SDM,” in Design, Automa-
buffers with router bypassing for network-on-chips (NoCs),” in Quality tion & Test in Europe Conference (DATE), 2009, pp. 566–569.
of Electronic Design (ISQED), 2009, pp. 826–832.

94

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (14)

Similaire à 94

Similaire à 94 (20)

Plus de srimoorthi

Plus de srimoorthi (16)

94