SlideShare une entreprise Scribd logo
1  sur  6
Télécharger pour lire hors ligne
Latency Reduction of Selected Data Streams in
 Network-on-Chips for Adaptive Manycore Systems
                   Thilo Pionteck, Christoph Osterloh                                      Carsten Albrecht
                    Institute of Computer Engineering                                  Dr¨ ger Medical GmbH
                                                                                         a
                           Universit¨ t zu L¨ beck
                                    a       u                                               23558 L¨ beck
                                                                                                   u
                          23538 L¨ beck, Germany
                                  u                                                           Germany
              Email: {pionteck, osterloh}@iti.uni-luebeck.de                    Email: Carsten.Albrecht@draeger.com



   Abstract—This paper reviews Network-on-Chip architectures             In case that the number of hops cannot be reduced, a
with prioritization of selected data streams targeting runtime        communication latency reduction can be achieved by reducing
reconfigurable manycore systems. The common idea of these              the latency of individual routers. Appropriate techniques are
architectures is to minimize the latency of selected packet
transmissions by either bypassing or parallelizing processing         speculative execution of router pipeline stages in parallel [3],
stages in routers or by using dedicated links bypassing complete      [4] and by pre-computing routing decision using look-ahead
routers. Potential classes of selected data streams are latency       schemes [5], [6], [7]. End-to-end latency can also be reduced
critical messages, i.e. cache accesses in multiprocessor systems,     by using adaptive routing schemes, allowing to bypass nodes
or systems with semi-static data streams, i.e. systems in which the   with high congestion. The work presented in [8] describes
same components continuously exchange data for a longer period.
The review categorizes the diverse architectures and evaluates        such a NoC in combination with the ability to bypass the
their pros and cons in terms of universality, hardware efficiency      router pipeline. Common disadvantages of these approaches
and support of changing traffic patterns.                              are their increased hardware effort and a latency unsuitable
                                                                      for latency-critical messages. Based on the observation that
                       I. I NTRODUCTION                               only a certain amount of messages are latency-critical or
   With the emerge of manycore systems and the increased              show semi-static characteristics, it is favorable to prioritize
need for scalable global on-chip communication architectures,         these kind of data only. In [9], the composition and amount
Network-on-Chips (NoCs) are becoming the dominant com-                of latency-critical messages in shared-memory chip multi-
munication architecture for complex system designs. Com-              processor (CMP) systems are analyzed. The authors identify
pared to shared buses and point-to-point connections, NoCs            protocol requests, acknowledgment packets and critical word
feature high scalability, high throughput and cost efficiency in       packets in read and write transactions as the main categories
terms of area and power [1]. The main drawback of NoCs is             of latency-critical messages. It is also shown that data traffic
that packets have to pass several routers along their path. At        exhibits a strong temporal and spatial locality and accounts for
each router, packets compete for router resources while going         about 18% of the overall traffic. Examples for semi-static data
through a complex processing pipeline [2]. Depending on the           streams are given in [10]. Here, implementations of wireless
number of hops this results in a significant communication             communication standards are analyzed with regards to their
latency, limiting the overall system performance. In case of a        inter-module communication characteristics. The authors show
large number of consecutive packets following the same path,          that for long periods subsequent data items of a stream follow
some internal router processing steps are even not required, as,      the same route and have periodic behavior.
for example, routing decision will be the same for all packets.          The aim of this paper is to provide an overview on diverse
If these processing steps are left out, latency as well as power      approaches for prioritizing latency-critical or semi-static data
consumption can be reduced.                                           streams in NoC-based communication architectures. A focus
   A way to minimize the number of hops is to customize the           is set on NoC architectures suitable for runtime reconfigurable
network topology according to the communication characteris-          manycore systems. Irregular NoC architectures such as [11],
tics of the processing elements for a given application scenario.     [12] are not considered as well as NoCs based on non-mesh
Yet, this contradicts the idea of an universal communication          topologies such as [13]. The former are restricted to certain
architecture, increases the design effort, and is not applicable      communication patterns while the latter ask for high radix
for systems with a wide range of application scenarios. This          routers, significantly increasing the area footprint.
approach also implies that type and location of processing               Criteria for categorizing NoC architectures with latency
cores are fixed during lifetime of a system. For runtime recon-        reduction techniques are given in section II. Among these
figurable systems where processing cores can be exchanged at           criteria, the effect of a prioritization is chosen for categorizing
runtime, this assumption cannot be kept valid. Such systems           the NoC architectures: per end-to-end connection (section III),
show changing communication patterns during system lifetime           per router (section IV), or per path segment (section V). A
and ask for regular communication architectures.                      discussion of pros and cons of the architectures with regard to


978-1-4244-8971-8/10$26.00 c 2010 IEEE
Prioritization          Switching technique            Decision making             Link type
the requirements of runtime reconfigurable manycore systems                                   packet-switched circuit-switched speculative deterministic physical logical
                                                                                 Examples
is given in section VI.
                                                                                      [14]

                       II. D ESIGN S PACE                               core-to-core [15]
                                                                                      [16]
   There is a huge design space for latency reduction tech-
niques within NoCs. Focusing on NoCs suitable for adap-                               [17]
tive manycore systems and applying the design constraints                per router   [18]
discussed in the previous section, four features mainly dis-                          [19]

tinguishing architecures are identified: effect of prioritization,
switching technique, kind of decision making and link type.                           [2]

   Effect of prioritization: The aim of all presented NoCs                 per        [20]
                                                                           path
is to minimize the core-to-core latency for packets. Yet,                segment      [21]

the architectures differ in the effect of their prioritization                        [22]


efforts. Prioritization can be applied once for a core-to-core
connection, per router or per path segment. When prioritization                       Fig. 1.    NoC prioritization techniques design space
is done on basis of core-to-core connections, prioritization is
done only once at packets entry point to the communication            according to the effect of prioritization. In addition, figure 1
network. This prioritization will sustain till the packet reaches     shows the parameters of the individual NoCs presented in the
its final destination. Applying this technique, packets will           next sections.
normally bypass the standard NoC infrastructure which results                     III. C ORE - TO -C ORE P RIORITIZATION
in the highest potential for latency reduction. This is in contrast      This section comprises NoC architectures featuring core-to-
to the prioritization on per-hop basis. Prioritization has to be      core optimization. For the fast path, they all provide direct
redetermined at every hop, causing time overhead and, thus,           connections from source to sink, either by using a bus [14],
reduces the achievable latency reduction. In between these            by providing dedicated long-range links [15] or by forming
two extremes is the prioritization on per path segment basis.         logical topology on top of the physical topology [16]. As the
Here, packets travel along prioritized path segments. These           first two NoC designs provide additional physical connectivity
path segments may not cover the complete route from source            they are sub-categorized as heterogeneous NoC, while the
to sink. Prioritization decisions have to be done for each path       latter one is sub-categorized as a logical one.
segment individually. As these three prioritization methods are
mutually exclusive, they are chosen for categorizing the NoC          A. Heterogeneous NoCs
architectures reviewed in this paper.                                    The main characteristic of both architectures presented here
   Switching technique: For transferring prioritized packets,         is that they employ a standard regular grip-shaped NoC for
either packet switching or circuit switching techniques are           most of the traffic. Long distance or latency sensitive traffic
applied. Packet switching in this context means that a packet         is bypassed using the additional communication infrastructure.
travels along several routers from source to destination. At          An architectural view on both NoCs is given in figure 2.
each router, routing decisions and arbitration of network re-            A combination of a NoC with a shared bus is proposed
sources have to be done. This is in contrast to circuit switching.    in [14]. The bus enhanced NoC (BENoC), as shown in
Here, a dedicated path from source to destination exists. Note        figure 2(a), is composed of a packet switched grid-shaped
that within this work, even for circuit-switched transmissions        NoC and a low latency, low bandwidth bus. The bus is used
the data stream is divided into packets. In general, this is for      for global, latency-critical control signals, provides broadcast
comparability reasons with the non-prioritized traffic.                as well as multicast capabilities, and it can also be used
   Decision making: This criteria determines whether rout-            for the configuration and management of the NoC. High
ing for the prioritized packets is made deterministically or          throughput data communication between cores is handled by
speculatively. The former means that the routing decision is          the NoC. Performance evaluation is performed by means of
guaranteed to be correct, which is not the case for the latter.       a dynamic non-uniform cache access architecture (DNUCA)
Speculative forwarding also inheres the danger of generating          multi-processor system consisting of 16 processors and 64
dead flits, i.e. flits which are sent on a wrong link and which         L2 cache tiles. The authors show that BENoC facilitates an
have to be deleted.                                                   average system speedup of about 300% for several benchmarks
   Link type: The fast path to be used by prioritized packets         compared to a pure NoC-based communication infrastructure.
can either be physical or logical. A logical fast path is mapped      For the test system, the area requirement of the bus is less than
on top of the physical network. The packets follow the same           0.1% of the die area for a 0.18µm process. Area numbers for
route as non-prioritized packets, only parts or the router            the NoC are not given.
are bypassed. Physical fast path are dedicated connections               Another approach is followed by [15]. Here, long-range
implemented in silicon.                                               links are inserted on top of a regular mesh network (see
   Typical parameter combinations of latency-optimized NoCs           figure 2(b)). The long-range links consist of segments of fixed-
are presented in figure 1. Primary classification has been done         length links connected by repeaters with buffering capabilities.
Determination of number, length, and location of the long-                   in table I. Note that latency values are given per router while
range links is done at design time for a given application.                  overhead values are given for complete test systems.
The long-range links are used for any kind of traffic as long
as the long-range links provide a shorter path and do not                    A. Prioritized Access to Router Resources
cause deadlocks. The area overhead caused by the long-range                     A dynamic path management scheme on router level is
links is about 10% for a 4 × 4 mesh with 4 long-range links                  proposed in [17]. The idea is that flits arriving on a frequently
using a Xilinx Virtex-II FPGA as hardware platform. Overall                  used input/output link combination are prioritized against other
energy consumption increases by about 1%. For performance                    traffic during switch allocation. The input/output pair forms
evaluation, the critical traffic workload, the number of injected             a fast path and a virtual channel (VC) is dedicated for the
packets per cycle at which packet delivery rate rises abruptly, is           fast flits. Fast paths are determined locally by collecting
considered. Results for a 4 × 4 auto industry benchmark and a                statistics of intra-router transfer patterns. The router pipeline
5×5 telecom benchmark show that the critical traffic workload                 for flits travelling along a fast path can be reduced further
is increased by 13.6% and 36.3%, respectively. The application               by sending the switch arbitration request to the next node
specific insertions of long-range links at design time hampers                before the flit actually enters the node. Therefor, the switch
the usage of this NoC architecture for systems with unforesee-               allocation request is send to the next node while the flits
able communication patterns, i.e. adaptive manycore system.                  traverses the crossbar. Performance is evaluated using a CMP
Though the underlying communication architecture is still a                  architecture and executing applications from the SPLASH
regular mesh NoC, this work is considered within this paper.                 benchmark. Network latency is reduced by up to 30% and
                                                                             power consumption is reduced by about 2.5% on the cost of
                                                                             an area increase of 1.34%.
                                                           Processing
                                                           core
                                                           NoC switch
                                                                             B. Speculative Forwarding
                        Processing                         Repeater
                                                                                Another method to reduce the router pipeline latency is
                        core
                                                           NoC switch        presented in [18]. For each idle input link the output link
                        NoC switch                         with extra port
                                                           NoC link
                                                                             being used by the next packet transfer is predicted and switch
                        NoC link
                        Bus
                                                           Long-range
                                                           link
                                                                             arbitration is speculatively completed. If the prediction hits,
                                                                             routing computation and switch arbitration stages of the router
        (a) BNoC [14]                    (b) Long-range links [15]           pipeline are bypassed and switch traversal is completed in a
              Fig. 2.    Heterogenous NoC architectures
                                                                             single cycle. Otherwise, packets are transferred to the orig-
                                                                             inal router pipeline without any additional latency overhead.
B. Logical Topology                                                          Wrongly forwarded flits are masked in the output channel. For
   A NoC design with a reconfigurable, circuit-switched logi-                 prediction, several adaptive and static schemes were proposed
cal topology called ReNoC( Reonfigurable NoC) is presented                    and analyzed according their hit rate. In order to facilitate
in [16]. Conventional NoC routers are wrapped by topology                    a better adaption to different traffic characteristics, it is also
switches which form a configurable layer between routers                      possible to implement several prediction schemes for one input
and links. These topology switches can either be configured                   link in parallel. The optimal prediction scheme is selected by
to connect a link to a router port or to directly connect                    choosing the one with the highest hit rate over a given time
two links with each other bypassing the router. Thus, it is                  period. In three case studies, the latency is reduced in the range
possible to form logical long links between two cores, two                   of 30.7% − 48.2% on cost of an increase in area and power
routers, or between a processing core and a router. The                      in range of 6.4% − 15.9% and 8.0% − 10.5%, respectively.
logical links are created on top of a static NoC topology                       A combination of speculative forwarding and setting up of
and form a logical topology with a combination of circuit                    preferred paths is presented in [19]. The authors adapt the
switched and packet switched elements. Configuration of the                   ”mad-postman” technique and speculatively forward flits to a
topology switches is done according to the communication                     pre-configured output bypassing the router logic. Technically,
needs of the actual running application. Details about the                   this is realized by connecting all inputs directly with each
configuration process are not given. For a video object decoder               output via tri-state buffers. If an output has a preferred
application, the authors show that ReNoC facilitates a decrease              input, the corresponding tri-state buffer is preselected and all
in power consumption of about 56% compared to a static mesh                  flits arriving at that input are forwarded. In order to detect
topology. The topology switches lead to an area increase of                  mistakenly forwarded flits, incoming flits are also analyzed
about 10%.                                                                   whether the last forwarding led the flit closer to its destination.
                                                                             If this is not the case, the flit is stored in a FIFO and transferred
             IV. P ER ROUTER P RIORITIZATION                                 using the standard routing functionality. Mistakenly forwarded
   Prioritization on per-hop basis is done by providing pri-                 flits will be identified as dead flits at the first router not being
oritized access to router resources [17] or by speculative                   part of the preferred path and are deleted. Preferred path are set
forwarding and execution of router pipeline stages [18], [19].               up by single-flit packets and can be changed at runtime. The
A summary of the main characteristics of these NoCs is given                 preferred path latency is a function of the number of hops, of
TABLE I
                                        M AIN C HARACTERISTICS OF N O C S WITH P ER H OP P RIORITIZATION
                              No load latency                              Overhead                 Path       Speculative   Dead
                     standard            prioritized               power             area      determination    pipeline     flits
           [17]   2 clock cycles        1 clock cycle              −2.5%            1.34%        automatic        yes         no
           [18]   3 clock cycles        1 clock cycle          8.0% − 10.5%     6.4% − 15.9%     automatic        yes         yes
           [19]   1 clock cycle    delay of tri-state buffer      not given          13%           manual          no         yes


the delay of the tri-state buffers and of the links. Area overhead         mechanism is extended to allow a flexible binding of EVCs
is given with about 13% for a whole test chip.                             of arbitrary length to a node and to a more advanced buffer
                                                                           signaling. Compared to the original EVC design, an additional
          V. P RIORITIZATION PER PATH S EGMENT                             44% improvement in latency under heavy load and a reduction
   NoC architectures prioritizing packets along path segments              of power up to 8.2% is achieved.
spanning several hops are presented in this section. This kind                Another approach for setting up direct virtual links at
of prioritization is normally done by selecting dedicated VCs              runtime is given in [21]. Based on the current NoC state,
as realized in NoC designs by [2], [20], [21]. Along the path              virtual point-to-point (VIP) paths are created, allowing packets
segments, the prioritized packets bypass router pipeline stages            to bypass the pipeline of intermediate routers. Packets travel
which results in a reduced latency. A combination of path                  along VIPs by using a dedicated VC, for which each router is
segment prioritization and virtual, circuit-switched network               pre-configured to forward the packet to a designated output.
topology is presented in [22]. This network creates paths                  Each router port can be used by at most one VIP connection.
for prioritized packets which may lead from core-to-core.                  In combination with prioritizing VIP packets over normal NoC
Thus, this NoC could have been categorized as a core-to-core               traffic, VIP connections cannot attain busy channels along their
priorization architecture. Yet, as the length of the prioritized           path. VIPs are set up using a simple and small bit-width setup
path is not guaranteed, it is categorized as a per path segment            network controlled by a root node. This network is also used
optimizing NoC. A summary of the main characteristics of all               to collect the monitoring data of each router. Periodically, the
NoC architectures from this section is given in table II.                  root node checks whether an adaptation of the VIP paths is
                                                                           required and manages the tearing down of old and setting up
A. Virtual Links assigned to VCs                                           of new VIPs. Evaluation is done using a multicore SoC with
   The concept of Express Virtual Channels (EVCs) is pre-                  different benchmarks running on the same cores. Results show
sented in [2]. At any router port the set of VCs is partitioned            an average latency reduction of 44% and a power reduction
between normal VCs (NVCs) and EVCs. EVCs provide vir-                      of 17% compared to conventional NoC.
tual express lanes in the network which are used to bypass
intermediate routers by skipping the router pipeline. EVCs                 B. Spatial Division Multiplexing
are restricted to connect routers only along a single dimension               A combination of a packet switched NoC and a circuit
and are not allowed to turn. Focusing on dynamic EVCs, each                switched NoC is proposed in [22]. Using spatial-division
router can act as a source/sink of EVCs or as a bypass node.               multiplexing, network resources are split between a packet-
The length of each EVC can be configured in advance, allow-                 switching sub-network (Pnet) and a circuit-switched sub-
ing dynamic adaptation to different traffic patterns. Packets               network (Cnet). Configuration of the Cnet is done by a light-
normally try to acquire the longest possible EVC along their               weight setup-network called Snet. Processing of flits arriving
pass. In case of high contention of a particular EVC, smaller              at the Pnet is done in the same way as for standard packet-
EVCs can be chosen. If all possible EVCs are occupied, NVCs                switched NoCs. The only difference is during the routing
are used. While virtual express lanes are mapped on top of a               computing stage. If the Cnet part of the physical output link
regular mesh topology and do not require extra wires, extra                is free, the flit is moved to the Cnet. The Snet is used to
control lines are needed for flow control between sinks and                 build the longest possible direct link to the destination node.
sources of individual EVCs. For the SPLASH benchmark, the                  Flits traveling along the Cnet bypass the router pipeline and
authors show a latency reduction of 84%, a power reduction                 are sent in a pipelined fashion to the destination node. At
of 38% and a throughput improvement of 23%.                                the destination node, the flits are either transfered to the local
   An improved version of EVCs [2] based on a hybrid                       core or are handled in the same way as flits arriving at the
interconnect called NOCHI is given in [20]. The EVC network                Pnet. For synthetic traffic patterns, the authors showed latency
is supplemented by a control plane comprised of global lines               and power reduction of 45% and 22%, respectively. An area
spanning all nodes in a row or column. The global lines                    overhead of less than 10% is mainly caused by the Snet.
are used for exchanging broadcast control information and
flow control messages and replace the dedicated point-to-point                                     VI. D ISCUSSION
control wires required in the initial EVCs design. They base                  For runtime reconfigurable manycore systems, not only the
on capacitive feed-forward circuits and are extended for one               standard NoC design parameters such as throughput, latency,
cycle multi-broadcast abilities and collision detection with               power and area requirements are relevant. Key properties
node quantity determination. The original EVC flow control                  having high impact on these parameters are the adaptability
TABLE II
                                   M AIN C HARACTERISTICS OF N O C S WITH P RIORITIZATION OF PATH S EGMENTS
                             No load   latency             Power          Area                  Type of virtual           Configuration
                        standard          prioritized    reduction      overhead                   connection
            [2]    4   clock cycles     2 clock cycles      38%        conrol lines    arbitrary nodes in one dimension    design time
            [20]   4   clock cycles     2 clock cycles      44%      control network   arbitrary nodes in one dimension      runtime
            [21]   5   clock cycles     2 clock cycles      18%            2%                     core to core               runtime
            [22]   5   clock cycles       wire delay       22%           < 10%               core/node to node/core          runtime


to diverse traffic patterns, the selective prioritization of certain          wire delay between nodes [14], [16], or to one clock cycle per
data streams, and implementation issues. Table III summarizes                hop [15]. In combination with reduced latency, bypassing of
these relevant parameters for the NoC designs presented.                     routers also leads to energy savings. As the additional physical
Concerning implementation issues, NoC designs for runtime                    communication links exist in parallel to the standard NoC,
reconfigurable manycore systems are often restricted to the                   system throughput is increased, too. Another advantage of this
hardware structure of and design tool limitations for Xilinx                 NoC category is that latency sensitive traffic is guaranteed
Virtex FPGAs. Apart from few designs implemented on ASIC-                    to be prioritized. There is no speculation involved as in
style runtime reconfigurable platforms, these FPGAs form the                  the link prioritization architectures presented in subsection
basis for most runtime reconfigurable systems. Thus, NoC                      B of section IV. When focusing on runtime reconfigurable
designs have to deal with their restrictions and limitations.                manycore architectures, a drawback of this NoC category
Column technology requirements of table III lists the hardware               is their limited flexibility. Apart from [16], they are either
requirement of NoC architectures [19] and [20] which hamper                  limited by the message length to be transmitted along the
a direct realization on the Virtex FPGA platform. Implemen-                  additional infrastructure or by node locations. [14] is optimized
tation issues that complicate but do not prevent an FPGA                     to transfer short control messages along the additional network
realization are given in column layout anomalies of table III.               and, thus, is not appropriate for data intensive semi-static data
These restrictions are caused by the fact that for runtime                   streams. Manual insertion of long links at design time restricts
reconfigurable FPGA designs, a homogeneous and regular                        placement of runtime exchangeable processing cores to certain
system layout is desirable. Even though it is possible, routing              locations in case they want to make use of these links [15].
signals through regions to be reconfigured is not advisable.                     Concerning adaptivity to changing traffic patterns, NoC
As a result, additional, non-uniform control wires between                   architectures prioritizing on per hop basis or per segment basis
nodes [2], physical fast paths [14], [15] or additional control              provide a flexible option. The only exception is [2], where
networks [20], [21], [22] hamper a smooth design flow.                        the architecture requires dedicated control lines for connecting
   Another important design feature for runtime reconfigurable                source and sink of virtual links. Thus, virtual connections have
manycore system is the ability to prioritize selected data                   to be defined at design time which reduces system flexibility
streams. Column prioritization of packets of table III specifies              in the same way as the manual insertion of long links for NoC
whether prioritization is selectable or fixed for all packets.                architectures [15]. Yet, the extended version of [2] presented
As pointed out in the introduction, a universal prioritization               in [20] circumvents this limitation. With the exception of [19],
neglects the fact that often only a small amount of data is                  prioritization on per router basis neither requires any dedicated
latency-sensitive [9]. NoCs that prioritize all packets along a              control lines nor additional physical bypasses. As a result,
route are well suited for semi-static data streams, yet small                these architectures are universal applicable. Yet, they suffer
control messages might even be delayed in the case they do                   from reduced optimization potential. Flits always have to pass
not follow the main route. This is especially true for NoC                   at least some stages of the router pipeline at each hop which
architectures with speculative forwarding such as [17], [18].                limits the achievable latency reduction. In order to reduce
Whether or not a NoC guarantees to prioritize a selected                     pipeline depth and, thus, the latency as much as possible,
data stream is given in column guaranteed prioritization of                  often complex speculative pipeline structures are used [17],
table III.                                                                   [18]. This comes on costs of area as well as power efficiency.
   With regard to latency, all NoC architectures achieve sig-                Concerning energy efficiency, the approaches of [17], [18]
nificant improvements. Yet, they feature significant differences               are problematic. Both designs speculatively forwards flits and
in energy consumption and area overhead. In general, NoCs                    have to check afterwards whether this was correct or not. This
providing core-to-core prioritization tend to show the highest               increases switching activity and in case of [18] may also lead
area increase compared to standard mesh-topology NoC de-                     to congestion. In addition, speculation failure rate becomes
signs. This is a result of the additional physical communication             high in case of increasing network traffic [2]. For runtime
structure such as a bus [14] or long-range links [15], or due to             reconfigurable systems, the self-adaptive approach of [17],
the additional logic for setting up virtual topologies [16]. The             [18] is favorable. These architectures do not require any con-
main advantage of these architecture is their near optimal com-              figuration for prioritizing frequently chosen connections. Yet,
munication latency for prioritized data streams. The additional              the required configuration of routers in [19] has the advantages
communication infrastructure bypasses the router pipeline at                 that a settling phase is avoided and that the communication
each hop and reduces the communication delay to either the                   network can better be adapted to latency-critical data flows.
TABLE III
                   C HARACTERISTICS OF PRESENTED N O C ARCHITECTURES FOR RUNTIME RECONFIGURABLE MANYCORE DESIGNS
                   Kind of                Prioritization   Guaranteed          Technology         Layout anomalies                   Remarks
                prioritization             of packets      priotitization     requirements
 [14]           additional bus              selectable          yes                 -             bus in tree topology   bus can handle low-bandwidth only
 [15]         physical long links          destination          yes                 -              long-links disturb           flexibility restricted
                                            dependent                                                  regularity                by fixed long links
 [16]    circuit-switched logical links        fixed             yes                   -                     -               configurable virtual topology
 [17]           reduced pipeline               fixed             no                    -                     -                             -
 [18]          reduced pipeline +              fixed             no                    -                     -                             -
             speculative forwarding
 [19]         reduced pipeline +              fixed              no            tri-state buffers             -                    configurable paths
             speculative forwarding
  [2]       virtual express channels          fixed              no                    -            extra control lines       Virtual express channels
                                                                                                                              in one dimension only
 [20]       virtual express channels          fixed              no            capacitive feed-      control network          Virtual express channels
                                                                              forward circuits                                in one dimension only
 [21]     virtual point-to-point links      selectable          yes                   -              setup network                       -
 [22]    circuit-switched logical links        fixed             yes                   -              setup network                       -


   The VC-based NoCs with prioritization per path segment                        [7] L. Xin and C.-s. Choy, “A Low-latency NoC Router with Lookahead
summarized in subsection A of section V suffer from the                              Bypass,” in IEEE Int. Symp. pn Circuits and Systems (ISCAS), 2010, pp.
                                                                                     3981–3984.
same drawback as NoCs prioritizing individual input/output                       [8] A. Kumar, L.-S. Peh, and N. Jha, “Token Flow Control,” in 41st
connections per router: flits have to pass at least some router                       IEEE/ACM Int. Symp. on Microarchitecture (MICRO-41), 2008, pp.
pipeline stages, lowering the achievable latency reduction.                          342–353.
                                                                                 [9] Z. Li, J. Wu, L. Shang, R. Dick, and Y. Sun, “Latency Criticality Aware
The approach of [2] and its extension in [20] also limits                            On-Chip Communication,” in Design, Automation & Test in Europe
virtual links to one dimension. In case source and sink are                          Conference (DATE), 2009, pp. 1052–1057.
not located in the same dimension, flits have to pass at least                   [10] P. T. Wolkotte, G. J. Smit, Rauwerda, and L. T. Smit, “An Energy-
                                                                                     Efficient Reconfigurable Circuit-Switched Network-on-Chip,” in 19th
three times the full router pipeline. In contrast, [21] allows                       Int. Parallel and Distributed Processing Symp., 2005, pp. 155a–155a.
setting up virtual connections between cores or routers directly.               [11] J. Chan and S. Parameswaran, “NoCOUT: NoC Topology Generation
With regard to virtual connections, the approach of [16] is                          with Mixed Packet-switched and Point-to-Point Networks,” in Asia and
                                                                                     South Pacific Design Automation Conference, 2008, pp. 265–270.
the most flexible one, as a complete virtual topology can                        [12] B. Grot, J. Hestness, S. Keckler, and O. Mutlu, “Express Cube Topolo-
be generated on top of the physical network. While still                             gies for on-Chip Interconnects,” in IEEE 15th Int. Symp. on High
be configurable, this architecture enables the design of an                           Performance Computer Architecture (HPCA), 2009, pp. 163–174.
                                                                                [13] J. Kim, J. Balfour, and W. Dally, “Flattened Butterfly Topology for On-
application specific infrastructure and, thus, is well suited for                     Chip Networks,” in 40th IEEE/ACM Int. Symp. on Microarchitecture
runtime reconfigurable systems. The only drawback of this                             (MICRO), 2007, pp. 172–182.
approach is that configuration affects an entire link. In case a                 [14] R. Manevich, I. Walter, I. Cidon, and A. Kolodny, “Best of Both Worlds:
                                                                                     A Bus Enhanced NoC (BENoC),” in 3rd ACM/IEEE Int. Symp. on
core sends data to two different destinations a direct point-to-                     Networks-on-Chip (NoCS), 2009, pp. 173–182.
point connection cannot be set up.                                              [15] U. Ogras and R. Marculescu, “Application-Specific Network-on-Chip
                                                                                     Architecture Customization via Long-Range Link Insertion,” in Int.
                    VII. ACKNOWLEDGEMENT                                             Conf. on Computer-Aided Design (ICCAD), 2005, pp. 246–253.
                                                                                [16] M. Stensgaard and J. Sparso, “ReNoC: A Network-on-Chip Architecture
  This work was funded in part by the German Research                                with Reconfigurable Topology,” in Second ACM/IEEE Inter. Symp. on
Foundation (DFG) within priority programme 1148 under                                Networks-on-Chip (NoCS), 2008, pp. 55–64.
                                                                                [17] D. Park, R. Das, C. Nicopoulos, J. Kim, N. Vijaykrishnan, R. Iyer, and
grant reference Ma 1412/5.                                                           C. Das, “Design of a Dynamic Priority-Based Fast Path Architecture
                                                                                     for On-Chip Interconnects,” in 15th IEEE Symp. on High-Performance
                             R EFERENCES                                             Interconnects (HOTI), 2007, pp. 15–20.
 [1] Bolotin, Evgeny and Cidon, Israel and Ginosar, Ran and Kolodny,            [18] H. Matsutani, M. Koibuchi, H. Amano, and T. Yoshinaga, “Prediction
     Avinoam, “Cost considerations in network on chip,” Integration, the             router: Yet another low latency on-chip router architecture,” in IEEE
     VLSI Jounal, vol. 38, no. 1, pp. 19–42, 2004.                                   15th Int. Symp. on High Performance Computer Architecture (HPCA),
 [2] Kumar, Amit and Peh, Li-Shiuan and Kundu, Partha and Jha, Niraj K.,             2009, pp. 367–378.
     “Express Virtual Channels: Towards the Ideal Interconnection Fabric,”      [19] G. Michelogiannakis, D. Pnevmatikatos, and M. Katevenis, “Approach-
     in 4th Int. Symp. on Computer Architecture, 2007, pp. 150–161.                  ing Ideal NoC Latency with Pre-Configured Routes,” in First Int. Symp.
 [3] R. Mullins, A. West, and S. Moore, “Low-Latency Virtual-Channel                 on Networks-on-Chip (NOCS), 2007, pp. 153–162.
     Routers for On-Chip Networks,” in 31st Int.Symp. on Computer Ar-           [20] T. Krishna, A. Kumar, P. Chiang, M. Erez, and L.-S. Peh, “NoC with
     chitecture, 2004, pp. 188–197.                                                  Near-Ideal Express Virtual Channels Using Global-Line Communica-
 [4] L.-S. Peh and W. Dally, “A delay model and speculative architecture             tion,” in 16th IEEE Symp. on High Performance Interconnects (HOTI),
     for pipelined routers,” in 7th Int. Symp. on High-Performance Computer          2008, pp. 11–20.
     Architecture (HPCA), 2001, pp. 255–266.                                    [21] M. Modarressi, A. Tavakkol, and H. Sarbazi-Azad, “Virtual Point-to-
 [5] K. Kim, S.-J. Lee, K. Lee, and H.-J. Yoo, “An Arbitration Look-Ahead            Point Connections for NoCs,” IEEE Trans. on Computer-Aided Design
     Scheme for Reducing End-to-End Latency in Networks on Chip,” in                 of Integrated Circuits and Systems, vol. 29, no. 6, pp. 855–868, 2010.
     IEEE Int. Symp. on Circuits and Systems (ISCAS), 2005, pp. 2357–2360.      [22] M. Modarressi, H. Sarbazi-Azad, and M. Arjomand, “A Hybrid Packet-
 [6] A. Kodi, A. Louri, and J. Wang, “Design of energy-efficient channel              Circuit Switched on-Chip Network Based on SDM,” in Design, Automa-
     buffers with router bypassing for network-on-chips (NoCs),” in Quality          tion & Test in Europe Conference (DATE), 2009, pp. 566–569.
     of Electronic Design (ISQED), 2009, pp. 826–832.

Contenu connexe

Tendances

Congestion Free Routes for Wireless Mesh Networks
Congestion Free Routes for Wireless Mesh NetworksCongestion Free Routes for Wireless Mesh Networks
Congestion Free Routes for Wireless Mesh NetworksNemesio Jr. Macabale
 
NS2 Projects 2014 in HCL velachery
NS2 Projects 2014 in HCL velacheryNS2 Projects 2014 in HCL velachery
NS2 Projects 2014 in HCL velacherySenthilvel S
 
EFFICIENT MULTI-PATH PROTOCOL FOR WIRELESS SENSOR NETWORKS
EFFICIENT MULTI-PATH PROTOCOL FOR WIRELESS SENSOR NETWORKSEFFICIENT MULTI-PATH PROTOCOL FOR WIRELESS SENSOR NETWORKS
EFFICIENT MULTI-PATH PROTOCOL FOR WIRELESS SENSOR NETWORKSijwmn
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
11.a study of congestion aware adaptive routing protocols in manet
11.a study of congestion aware adaptive routing protocols in manet11.a study of congestion aware adaptive routing protocols in manet
11.a study of congestion aware adaptive routing protocols in manetAlexander Decker
 
DETERMINING THE NETWORK THROUGHPUT AND FLOW RATE USING GSR AND AAL2R
DETERMINING THE NETWORK THROUGHPUT AND FLOW RATE USING GSR AND AAL2RDETERMINING THE NETWORK THROUGHPUT AND FLOW RATE USING GSR AND AAL2R
DETERMINING THE NETWORK THROUGHPUT AND FLOW RATE USING GSR AND AAL2Rijujournal
 
OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...
OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...
OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...IJASCSE
 
Improved Good put using Harvest-Then-Transmit Protocol for Video Transfer
Improved Good put using Harvest-Then-Transmit Protocol for Video TransferImproved Good put using Harvest-Then-Transmit Protocol for Video Transfer
Improved Good put using Harvest-Then-Transmit Protocol for Video TransferEswar Publications
 
Performance Comparison of AODV and DSDV Routing Protocols for Ad-hoc Wireless...
Performance Comparison of AODV and DSDV Routing Protocols for Ad-hoc Wireless...Performance Comparison of AODV and DSDV Routing Protocols for Ad-hoc Wireless...
Performance Comparison of AODV and DSDV Routing Protocols for Ad-hoc Wireless...Narendra Singh Yadav
 
Qo s oriented distributed routing protocols : anna university 2nd review ppt
Qo s   oriented  distributed routing  protocols : anna university 2nd review pptQo s   oriented  distributed routing  protocols : anna university 2nd review ppt
Qo s oriented distributed routing protocols : anna university 2nd review pptAAKASH S
 
Performance Evaluation and Comparison of Ad-Hoc Source Routing Protocols
Performance Evaluation and Comparison of Ad-Hoc Source Routing ProtocolsPerformance Evaluation and Comparison of Ad-Hoc Source Routing Protocols
Performance Evaluation and Comparison of Ad-Hoc Source Routing ProtocolsNarendra Singh Yadav
 
ON THE SUPPORT OF MULTIMEDIA APPLICATIONS OVER WIRELESS MESH NETWORKS
ON THE SUPPORT OF MULTIMEDIA APPLICATIONS  OVER WIRELESS MESH NETWORKS ON THE SUPPORT OF MULTIMEDIA APPLICATIONS  OVER WIRELESS MESH NETWORKS
ON THE SUPPORT OF MULTIMEDIA APPLICATIONS OVER WIRELESS MESH NETWORKS ijwmn
 
Haqr the hierarchical ant based qos aware on demand routing for manets
Haqr the hierarchical ant based qos aware on demand routing for manetsHaqr the hierarchical ant based qos aware on demand routing for manets
Haqr the hierarchical ant based qos aware on demand routing for manetscsandit
 
Influence of Clustering on the Performance of MobileAd Hoc Networks (MANETs)
Influence of Clustering on the Performance of MobileAd Hoc Networks (MANETs)Influence of Clustering on the Performance of MobileAd Hoc Networks (MANETs)
Influence of Clustering on the Performance of MobileAd Hoc Networks (MANETs)Narendra Singh Yadav
 
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...IDES Editor
 

Tendances (20)

Congestion Free Routes for Wireless Mesh Networks
Congestion Free Routes for Wireless Mesh NetworksCongestion Free Routes for Wireless Mesh Networks
Congestion Free Routes for Wireless Mesh Networks
 
NS2 Projects 2014 in HCL velachery
NS2 Projects 2014 in HCL velacheryNS2 Projects 2014 in HCL velachery
NS2 Projects 2014 in HCL velachery
 
EFFICIENT MULTI-PATH PROTOCOL FOR WIRELESS SENSOR NETWORKS
EFFICIENT MULTI-PATH PROTOCOL FOR WIRELESS SENSOR NETWORKSEFFICIENT MULTI-PATH PROTOCOL FOR WIRELESS SENSOR NETWORKS
EFFICIENT MULTI-PATH PROTOCOL FOR WIRELESS SENSOR NETWORKS
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
11.a study of congestion aware adaptive routing protocols in manet
11.a study of congestion aware adaptive routing protocols in manet11.a study of congestion aware adaptive routing protocols in manet
11.a study of congestion aware adaptive routing protocols in manet
 
DETERMINING THE NETWORK THROUGHPUT AND FLOW RATE USING GSR AND AAL2R
DETERMINING THE NETWORK THROUGHPUT AND FLOW RATE USING GSR AND AAL2RDETERMINING THE NETWORK THROUGHPUT AND FLOW RATE USING GSR AND AAL2R
DETERMINING THE NETWORK THROUGHPUT AND FLOW RATE USING GSR AND AAL2R
 
OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...
OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...
OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...
 
23
2323
23
 
V25112115
V25112115V25112115
V25112115
 
Ijcnc050203
Ijcnc050203Ijcnc050203
Ijcnc050203
 
Improved Good put using Harvest-Then-Transmit Protocol for Video Transfer
Improved Good put using Harvest-Then-Transmit Protocol for Video TransferImproved Good put using Harvest-Then-Transmit Protocol for Video Transfer
Improved Good put using Harvest-Then-Transmit Protocol for Video Transfer
 
Fa25939942
Fa25939942Fa25939942
Fa25939942
 
Performance Comparison of AODV and DSDV Routing Protocols for Ad-hoc Wireless...
Performance Comparison of AODV and DSDV Routing Protocols for Ad-hoc Wireless...Performance Comparison of AODV and DSDV Routing Protocols for Ad-hoc Wireless...
Performance Comparison of AODV and DSDV Routing Protocols for Ad-hoc Wireless...
 
Qo s oriented distributed routing protocols : anna university 2nd review ppt
Qo s   oriented  distributed routing  protocols : anna university 2nd review pptQo s   oriented  distributed routing  protocols : anna university 2nd review ppt
Qo s oriented distributed routing protocols : anna university 2nd review ppt
 
Performance Evaluation and Comparison of Ad-Hoc Source Routing Protocols
Performance Evaluation and Comparison of Ad-Hoc Source Routing ProtocolsPerformance Evaluation and Comparison of Ad-Hoc Source Routing Protocols
Performance Evaluation and Comparison of Ad-Hoc Source Routing Protocols
 
ON THE SUPPORT OF MULTIMEDIA APPLICATIONS OVER WIRELESS MESH NETWORKS
ON THE SUPPORT OF MULTIMEDIA APPLICATIONS  OVER WIRELESS MESH NETWORKS ON THE SUPPORT OF MULTIMEDIA APPLICATIONS  OVER WIRELESS MESH NETWORKS
ON THE SUPPORT OF MULTIMEDIA APPLICATIONS OVER WIRELESS MESH NETWORKS
 
Haqr the hierarchical ant based qos aware on demand routing for manets
Haqr the hierarchical ant based qos aware on demand routing for manetsHaqr the hierarchical ant based qos aware on demand routing for manets
Haqr the hierarchical ant based qos aware on demand routing for manets
 
Influence of Clustering on the Performance of MobileAd Hoc Networks (MANETs)
Influence of Clustering on the Performance of MobileAd Hoc Networks (MANETs)Influence of Clustering on the Performance of MobileAd Hoc Networks (MANETs)
Influence of Clustering on the Performance of MobileAd Hoc Networks (MANETs)
 
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...
 
Ab25144148
Ab25144148Ab25144148
Ab25144148
 

En vedette (14)

Iaetsd design and implementation of
Iaetsd design and implementation ofIaetsd design and implementation of
Iaetsd design and implementation of
 
84
8484
84
 
56
5656
56
 
2
22
2
 
50
5050
50
 
62
6262
62
 
52
5252
52
 
61
6161
61
 
55
5555
55
 
53
5353
53
 
49
4949
49
 
41
4141
41
 
75
7575
75
 
My profile
My profileMy profile
My profile
 

Similaire à 94

Network on Chip Architecture and Routing Techniques: A survey
Network on Chip Architecture and Routing Techniques: A surveyNetwork on Chip Architecture and Routing Techniques: A survey
Network on Chip Architecture and Routing Techniques: A surveyIJRES Journal
 
Performance Analysis of Mesh-based NoC’s on Routing Algorithms
Performance Analysis of Mesh-based NoC’s on Routing Algorithms Performance Analysis of Mesh-based NoC’s on Routing Algorithms
Performance Analysis of Mesh-based NoC’s on Routing Algorithms IJECEIAES
 
A ULTRA-LOW POWER ROUTER DESIGN FOR NETWORK ON CHIP
A ULTRA-LOW POWER ROUTER DESIGN FOR NETWORK ON CHIPA ULTRA-LOW POWER ROUTER DESIGN FOR NETWORK ON CHIP
A ULTRA-LOW POWER ROUTER DESIGN FOR NETWORK ON CHIPijaceeejournal
 
Noise Tolerant and Faster On Chip Communication Using Binoc Model
Noise Tolerant and Faster On Chip Communication Using Binoc ModelNoise Tolerant and Faster On Chip Communication Using Binoc Model
Noise Tolerant and Faster On Chip Communication Using Binoc ModelIJMER
 
A Flexible Software/Hardware Adaptive Network for Embedded Distributed Archit...
A Flexible Software/Hardware Adaptive Network for Embedded Distributed Archit...A Flexible Software/Hardware Adaptive Network for Embedded Distributed Archit...
A Flexible Software/Hardware Adaptive Network for Embedded Distributed Archit...csijjournal
 
A FLEXIBLE SOFTWARE/HARDWARE ADAPTIVE NETWORK FOR EMBEDDED DISTRIBUTED ARCHIT...
A FLEXIBLE SOFTWARE/HARDWARE ADAPTIVE NETWORK FOR EMBEDDED DISTRIBUTED ARCHIT...A FLEXIBLE SOFTWARE/HARDWARE ADAPTIVE NETWORK FOR EMBEDDED DISTRIBUTED ARCHIT...
A FLEXIBLE SOFTWARE/HARDWARE ADAPTIVE NETWORK FOR EMBEDDED DISTRIBUTED ARCHIT...csijjournal
 
Crosslayertermpaper
CrosslayertermpaperCrosslayertermpaper
CrosslayertermpaperB.T.L.I.T
 
CPCRT: Crosslayered and Power Conserved Routing Topology for congestion Cont...
CPCRT: Crosslayered and Power Conserved Routing Topology  for congestion Cont...CPCRT: Crosslayered and Power Conserved Routing Topology  for congestion Cont...
CPCRT: Crosslayered and Power Conserved Routing Topology for congestion Cont...IOSR Journals
 
AREA-EFFICIENT DESIGN OF SCHEDULER FOR ROUTING NODE OF NETWORK-ON-CHIP
AREA-EFFICIENT DESIGN OF SCHEDULER FOR ROUTING NODE OF NETWORK-ON-CHIPAREA-EFFICIENT DESIGN OF SCHEDULER FOR ROUTING NODE OF NETWORK-ON-CHIP
AREA-EFFICIENT DESIGN OF SCHEDULER FOR ROUTING NODE OF NETWORK-ON-CHIPVLSICS Design
 
Area-Efficient Design of Scheduler for Routing Node of Network-On-Chip
Area-Efficient Design of Scheduler for Routing Node of Network-On-ChipArea-Efficient Design of Scheduler for Routing Node of Network-On-Chip
Area-Efficient Design of Scheduler for Routing Node of Network-On-ChipVLSICS Design
 
Intra cluster routing with backup
Intra cluster routing with backupIntra cluster routing with backup
Intra cluster routing with backupcsandit
 
Cross Layer- Performance Enhancement Architecture (CL-PEA) for MANET
Cross Layer- Performance Enhancement Architecture (CL-PEA) for MANETCross Layer- Performance Enhancement Architecture (CL-PEA) for MANET
Cross Layer- Performance Enhancement Architecture (CL-PEA) for MANETijcncs
 
DIA-TORUS:A NOVEL TOPOLOGY FOR NETWORK ON CHIP DESIGN
DIA-TORUS:A NOVEL TOPOLOGY FOR NETWORK ON CHIP DESIGNDIA-TORUS:A NOVEL TOPOLOGY FOR NETWORK ON CHIP DESIGN
DIA-TORUS:A NOVEL TOPOLOGY FOR NETWORK ON CHIP DESIGNIJCNCJournal
 
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...VLSICS Design
 
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...VLSICS Design
 
Ijarcet vol-2-issue-4-1420-1427
Ijarcet vol-2-issue-4-1420-1427Ijarcet vol-2-issue-4-1420-1427
Ijarcet vol-2-issue-4-1420-1427Editor IJARCET
 

Similaire à 94 (20)

Network on Chip Architecture and Routing Techniques: A survey
Network on Chip Architecture and Routing Techniques: A surveyNetwork on Chip Architecture and Routing Techniques: A survey
Network on Chip Architecture and Routing Techniques: A survey
 
Performance Analysis of Mesh-based NoC’s on Routing Algorithms
Performance Analysis of Mesh-based NoC’s on Routing Algorithms Performance Analysis of Mesh-based NoC’s on Routing Algorithms
Performance Analysis of Mesh-based NoC’s on Routing Algorithms
 
A ULTRA-LOW POWER ROUTER DESIGN FOR NETWORK ON CHIP
A ULTRA-LOW POWER ROUTER DESIGN FOR NETWORK ON CHIPA ULTRA-LOW POWER ROUTER DESIGN FOR NETWORK ON CHIP
A ULTRA-LOW POWER ROUTER DESIGN FOR NETWORK ON CHIP
 
Design of fault tolerant algorithm for network on chip router using field pr...
Design of fault tolerant algorithm for network on chip router  using field pr...Design of fault tolerant algorithm for network on chip router  using field pr...
Design of fault tolerant algorithm for network on chip router using field pr...
 
Noise Tolerant and Faster On Chip Communication Using Binoc Model
Noise Tolerant and Faster On Chip Communication Using Binoc ModelNoise Tolerant and Faster On Chip Communication Using Binoc Model
Noise Tolerant and Faster On Chip Communication Using Binoc Model
 
A Flexible Software/Hardware Adaptive Network for Embedded Distributed Archit...
A Flexible Software/Hardware Adaptive Network for Embedded Distributed Archit...A Flexible Software/Hardware Adaptive Network for Embedded Distributed Archit...
A Flexible Software/Hardware Adaptive Network for Embedded Distributed Archit...
 
A FLEXIBLE SOFTWARE/HARDWARE ADAPTIVE NETWORK FOR EMBEDDED DISTRIBUTED ARCHIT...
A FLEXIBLE SOFTWARE/HARDWARE ADAPTIVE NETWORK FOR EMBEDDED DISTRIBUTED ARCHIT...A FLEXIBLE SOFTWARE/HARDWARE ADAPTIVE NETWORK FOR EMBEDDED DISTRIBUTED ARCHIT...
A FLEXIBLE SOFTWARE/HARDWARE ADAPTIVE NETWORK FOR EMBEDDED DISTRIBUTED ARCHIT...
 
Address Interleaving in NoCs
Address Interleaving in NoCsAddress Interleaving in NoCs
Address Interleaving in NoCs
 
C0351725
C0351725C0351725
C0351725
 
Crosslayertermpaper
CrosslayertermpaperCrosslayertermpaper
Crosslayertermpaper
 
CPCRT: Crosslayered and Power Conserved Routing Topology for congestion Cont...
CPCRT: Crosslayered and Power Conserved Routing Topology  for congestion Cont...CPCRT: Crosslayered and Power Conserved Routing Topology  for congestion Cont...
CPCRT: Crosslayered and Power Conserved Routing Topology for congestion Cont...
 
www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
 
AREA-EFFICIENT DESIGN OF SCHEDULER FOR ROUTING NODE OF NETWORK-ON-CHIP
AREA-EFFICIENT DESIGN OF SCHEDULER FOR ROUTING NODE OF NETWORK-ON-CHIPAREA-EFFICIENT DESIGN OF SCHEDULER FOR ROUTING NODE OF NETWORK-ON-CHIP
AREA-EFFICIENT DESIGN OF SCHEDULER FOR ROUTING NODE OF NETWORK-ON-CHIP
 
Area-Efficient Design of Scheduler for Routing Node of Network-On-Chip
Area-Efficient Design of Scheduler for Routing Node of Network-On-ChipArea-Efficient Design of Scheduler for Routing Node of Network-On-Chip
Area-Efficient Design of Scheduler for Routing Node of Network-On-Chip
 
Intra cluster routing with backup
Intra cluster routing with backupIntra cluster routing with backup
Intra cluster routing with backup
 
Cross Layer- Performance Enhancement Architecture (CL-PEA) for MANET
Cross Layer- Performance Enhancement Architecture (CL-PEA) for MANETCross Layer- Performance Enhancement Architecture (CL-PEA) for MANET
Cross Layer- Performance Enhancement Architecture (CL-PEA) for MANET
 
DIA-TORUS:A NOVEL TOPOLOGY FOR NETWORK ON CHIP DESIGN
DIA-TORUS:A NOVEL TOPOLOGY FOR NETWORK ON CHIP DESIGNDIA-TORUS:A NOVEL TOPOLOGY FOR NETWORK ON CHIP DESIGN
DIA-TORUS:A NOVEL TOPOLOGY FOR NETWORK ON CHIP DESIGN
 
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
 
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
 
Ijarcet vol-2-issue-4-1420-1427
Ijarcet vol-2-issue-4-1420-1427Ijarcet vol-2-issue-4-1420-1427
Ijarcet vol-2-issue-4-1420-1427
 

Plus de srimoorthi (16)

73
7373
73
 
72
7272
72
 
70
7070
70
 
69
6969
69
 
68
6868
68
 
63
6363
63
 
60
6060
60
 
59
5959
59
 
57
5757
57
 
51
5151
51
 
45
4545
45
 
44
4444
44
 
43
4343
43
 
42
4242
42
 
39
3939
39
 
38
3838
38
 

94

  • 1. Latency Reduction of Selected Data Streams in Network-on-Chips for Adaptive Manycore Systems Thilo Pionteck, Christoph Osterloh Carsten Albrecht Institute of Computer Engineering Dr¨ ger Medical GmbH a Universit¨ t zu L¨ beck a u 23558 L¨ beck u 23538 L¨ beck, Germany u Germany Email: {pionteck, osterloh}@iti.uni-luebeck.de Email: Carsten.Albrecht@draeger.com Abstract—This paper reviews Network-on-Chip architectures In case that the number of hops cannot be reduced, a with prioritization of selected data streams targeting runtime communication latency reduction can be achieved by reducing reconfigurable manycore systems. The common idea of these the latency of individual routers. Appropriate techniques are architectures is to minimize the latency of selected packet transmissions by either bypassing or parallelizing processing speculative execution of router pipeline stages in parallel [3], stages in routers or by using dedicated links bypassing complete [4] and by pre-computing routing decision using look-ahead routers. Potential classes of selected data streams are latency schemes [5], [6], [7]. End-to-end latency can also be reduced critical messages, i.e. cache accesses in multiprocessor systems, by using adaptive routing schemes, allowing to bypass nodes or systems with semi-static data streams, i.e. systems in which the with high congestion. The work presented in [8] describes same components continuously exchange data for a longer period. The review categorizes the diverse architectures and evaluates such a NoC in combination with the ability to bypass the their pros and cons in terms of universality, hardware efficiency router pipeline. Common disadvantages of these approaches and support of changing traffic patterns. are their increased hardware effort and a latency unsuitable for latency-critical messages. Based on the observation that I. I NTRODUCTION only a certain amount of messages are latency-critical or With the emerge of manycore systems and the increased show semi-static characteristics, it is favorable to prioritize need for scalable global on-chip communication architectures, these kind of data only. In [9], the composition and amount Network-on-Chips (NoCs) are becoming the dominant com- of latency-critical messages in shared-memory chip multi- munication architecture for complex system designs. Com- processor (CMP) systems are analyzed. The authors identify pared to shared buses and point-to-point connections, NoCs protocol requests, acknowledgment packets and critical word feature high scalability, high throughput and cost efficiency in packets in read and write transactions as the main categories terms of area and power [1]. The main drawback of NoCs is of latency-critical messages. It is also shown that data traffic that packets have to pass several routers along their path. At exhibits a strong temporal and spatial locality and accounts for each router, packets compete for router resources while going about 18% of the overall traffic. Examples for semi-static data through a complex processing pipeline [2]. Depending on the streams are given in [10]. Here, implementations of wireless number of hops this results in a significant communication communication standards are analyzed with regards to their latency, limiting the overall system performance. In case of a inter-module communication characteristics. The authors show large number of consecutive packets following the same path, that for long periods subsequent data items of a stream follow some internal router processing steps are even not required, as, the same route and have periodic behavior. for example, routing decision will be the same for all packets. The aim of this paper is to provide an overview on diverse If these processing steps are left out, latency as well as power approaches for prioritizing latency-critical or semi-static data consumption can be reduced. streams in NoC-based communication architectures. A focus A way to minimize the number of hops is to customize the is set on NoC architectures suitable for runtime reconfigurable network topology according to the communication characteris- manycore systems. Irregular NoC architectures such as [11], tics of the processing elements for a given application scenario. [12] are not considered as well as NoCs based on non-mesh Yet, this contradicts the idea of an universal communication topologies such as [13]. The former are restricted to certain architecture, increases the design effort, and is not applicable communication patterns while the latter ask for high radix for systems with a wide range of application scenarios. This routers, significantly increasing the area footprint. approach also implies that type and location of processing Criteria for categorizing NoC architectures with latency cores are fixed during lifetime of a system. For runtime recon- reduction techniques are given in section II. Among these figurable systems where processing cores can be exchanged at criteria, the effect of a prioritization is chosen for categorizing runtime, this assumption cannot be kept valid. Such systems the NoC architectures: per end-to-end connection (section III), show changing communication patterns during system lifetime per router (section IV), or per path segment (section V). A and ask for regular communication architectures. discussion of pros and cons of the architectures with regard to 978-1-4244-8971-8/10$26.00 c 2010 IEEE
  • 2. Prioritization Switching technique Decision making Link type the requirements of runtime reconfigurable manycore systems packet-switched circuit-switched speculative deterministic physical logical Examples is given in section VI. [14] II. D ESIGN S PACE core-to-core [15] [16] There is a huge design space for latency reduction tech- niques within NoCs. Focusing on NoCs suitable for adap- [17] tive manycore systems and applying the design constraints per router [18] discussed in the previous section, four features mainly dis- [19] tinguishing architecures are identified: effect of prioritization, switching technique, kind of decision making and link type. [2] Effect of prioritization: The aim of all presented NoCs per [20] path is to minimize the core-to-core latency for packets. Yet, segment [21] the architectures differ in the effect of their prioritization [22] efforts. Prioritization can be applied once for a core-to-core connection, per router or per path segment. When prioritization Fig. 1. NoC prioritization techniques design space is done on basis of core-to-core connections, prioritization is done only once at packets entry point to the communication according to the effect of prioritization. In addition, figure 1 network. This prioritization will sustain till the packet reaches shows the parameters of the individual NoCs presented in the its final destination. Applying this technique, packets will next sections. normally bypass the standard NoC infrastructure which results III. C ORE - TO -C ORE P RIORITIZATION in the highest potential for latency reduction. This is in contrast This section comprises NoC architectures featuring core-to- to the prioritization on per-hop basis. Prioritization has to be core optimization. For the fast path, they all provide direct redetermined at every hop, causing time overhead and, thus, connections from source to sink, either by using a bus [14], reduces the achievable latency reduction. In between these by providing dedicated long-range links [15] or by forming two extremes is the prioritization on per path segment basis. logical topology on top of the physical topology [16]. As the Here, packets travel along prioritized path segments. These first two NoC designs provide additional physical connectivity path segments may not cover the complete route from source they are sub-categorized as heterogeneous NoC, while the to sink. Prioritization decisions have to be done for each path latter one is sub-categorized as a logical one. segment individually. As these three prioritization methods are mutually exclusive, they are chosen for categorizing the NoC A. Heterogeneous NoCs architectures reviewed in this paper. The main characteristic of both architectures presented here Switching technique: For transferring prioritized packets, is that they employ a standard regular grip-shaped NoC for either packet switching or circuit switching techniques are most of the traffic. Long distance or latency sensitive traffic applied. Packet switching in this context means that a packet is bypassed using the additional communication infrastructure. travels along several routers from source to destination. At An architectural view on both NoCs is given in figure 2. each router, routing decisions and arbitration of network re- A combination of a NoC with a shared bus is proposed sources have to be done. This is in contrast to circuit switching. in [14]. The bus enhanced NoC (BENoC), as shown in Here, a dedicated path from source to destination exists. Note figure 2(a), is composed of a packet switched grid-shaped that within this work, even for circuit-switched transmissions NoC and a low latency, low bandwidth bus. The bus is used the data stream is divided into packets. In general, this is for for global, latency-critical control signals, provides broadcast comparability reasons with the non-prioritized traffic. as well as multicast capabilities, and it can also be used Decision making: This criteria determines whether rout- for the configuration and management of the NoC. High ing for the prioritized packets is made deterministically or throughput data communication between cores is handled by speculatively. The former means that the routing decision is the NoC. Performance evaluation is performed by means of guaranteed to be correct, which is not the case for the latter. a dynamic non-uniform cache access architecture (DNUCA) Speculative forwarding also inheres the danger of generating multi-processor system consisting of 16 processors and 64 dead flits, i.e. flits which are sent on a wrong link and which L2 cache tiles. The authors show that BENoC facilitates an have to be deleted. average system speedup of about 300% for several benchmarks Link type: The fast path to be used by prioritized packets compared to a pure NoC-based communication infrastructure. can either be physical or logical. A logical fast path is mapped For the test system, the area requirement of the bus is less than on top of the physical network. The packets follow the same 0.1% of the die area for a 0.18µm process. Area numbers for route as non-prioritized packets, only parts or the router the NoC are not given. are bypassed. Physical fast path are dedicated connections Another approach is followed by [15]. Here, long-range implemented in silicon. links are inserted on top of a regular mesh network (see Typical parameter combinations of latency-optimized NoCs figure 2(b)). The long-range links consist of segments of fixed- are presented in figure 1. Primary classification has been done length links connected by repeaters with buffering capabilities.
  • 3. Determination of number, length, and location of the long- in table I. Note that latency values are given per router while range links is done at design time for a given application. overhead values are given for complete test systems. The long-range links are used for any kind of traffic as long as the long-range links provide a shorter path and do not A. Prioritized Access to Router Resources cause deadlocks. The area overhead caused by the long-range A dynamic path management scheme on router level is links is about 10% for a 4 × 4 mesh with 4 long-range links proposed in [17]. The idea is that flits arriving on a frequently using a Xilinx Virtex-II FPGA as hardware platform. Overall used input/output link combination are prioritized against other energy consumption increases by about 1%. For performance traffic during switch allocation. The input/output pair forms evaluation, the critical traffic workload, the number of injected a fast path and a virtual channel (VC) is dedicated for the packets per cycle at which packet delivery rate rises abruptly, is fast flits. Fast paths are determined locally by collecting considered. Results for a 4 × 4 auto industry benchmark and a statistics of intra-router transfer patterns. The router pipeline 5×5 telecom benchmark show that the critical traffic workload for flits travelling along a fast path can be reduced further is increased by 13.6% and 36.3%, respectively. The application by sending the switch arbitration request to the next node specific insertions of long-range links at design time hampers before the flit actually enters the node. Therefor, the switch the usage of this NoC architecture for systems with unforesee- allocation request is send to the next node while the flits able communication patterns, i.e. adaptive manycore system. traverses the crossbar. Performance is evaluated using a CMP Though the underlying communication architecture is still a architecture and executing applications from the SPLASH regular mesh NoC, this work is considered within this paper. benchmark. Network latency is reduced by up to 30% and power consumption is reduced by about 2.5% on the cost of an area increase of 1.34%. Processing core NoC switch B. Speculative Forwarding Processing Repeater Another method to reduce the router pipeline latency is core NoC switch presented in [18]. For each idle input link the output link NoC switch with extra port NoC link being used by the next packet transfer is predicted and switch NoC link Bus Long-range link arbitration is speculatively completed. If the prediction hits, routing computation and switch arbitration stages of the router (a) BNoC [14] (b) Long-range links [15] pipeline are bypassed and switch traversal is completed in a Fig. 2. Heterogenous NoC architectures single cycle. Otherwise, packets are transferred to the orig- inal router pipeline without any additional latency overhead. B. Logical Topology Wrongly forwarded flits are masked in the output channel. For A NoC design with a reconfigurable, circuit-switched logi- prediction, several adaptive and static schemes were proposed cal topology called ReNoC( Reonfigurable NoC) is presented and analyzed according their hit rate. In order to facilitate in [16]. Conventional NoC routers are wrapped by topology a better adaption to different traffic characteristics, it is also switches which form a configurable layer between routers possible to implement several prediction schemes for one input and links. These topology switches can either be configured link in parallel. The optimal prediction scheme is selected by to connect a link to a router port or to directly connect choosing the one with the highest hit rate over a given time two links with each other bypassing the router. Thus, it is period. In three case studies, the latency is reduced in the range possible to form logical long links between two cores, two of 30.7% − 48.2% on cost of an increase in area and power routers, or between a processing core and a router. The in range of 6.4% − 15.9% and 8.0% − 10.5%, respectively. logical links are created on top of a static NoC topology A combination of speculative forwarding and setting up of and form a logical topology with a combination of circuit preferred paths is presented in [19]. The authors adapt the switched and packet switched elements. Configuration of the ”mad-postman” technique and speculatively forward flits to a topology switches is done according to the communication pre-configured output bypassing the router logic. Technically, needs of the actual running application. Details about the this is realized by connecting all inputs directly with each configuration process are not given. For a video object decoder output via tri-state buffers. If an output has a preferred application, the authors show that ReNoC facilitates a decrease input, the corresponding tri-state buffer is preselected and all in power consumption of about 56% compared to a static mesh flits arriving at that input are forwarded. In order to detect topology. The topology switches lead to an area increase of mistakenly forwarded flits, incoming flits are also analyzed about 10%. whether the last forwarding led the flit closer to its destination. If this is not the case, the flit is stored in a FIFO and transferred IV. P ER ROUTER P RIORITIZATION using the standard routing functionality. Mistakenly forwarded Prioritization on per-hop basis is done by providing pri- flits will be identified as dead flits at the first router not being oritized access to router resources [17] or by speculative part of the preferred path and are deleted. Preferred path are set forwarding and execution of router pipeline stages [18], [19]. up by single-flit packets and can be changed at runtime. The A summary of the main characteristics of these NoCs is given preferred path latency is a function of the number of hops, of
  • 4. TABLE I M AIN C HARACTERISTICS OF N O C S WITH P ER H OP P RIORITIZATION No load latency Overhead Path Speculative Dead standard prioritized power area determination pipeline flits [17] 2 clock cycles 1 clock cycle −2.5% 1.34% automatic yes no [18] 3 clock cycles 1 clock cycle 8.0% − 10.5% 6.4% − 15.9% automatic yes yes [19] 1 clock cycle delay of tri-state buffer not given 13% manual no yes the delay of the tri-state buffers and of the links. Area overhead mechanism is extended to allow a flexible binding of EVCs is given with about 13% for a whole test chip. of arbitrary length to a node and to a more advanced buffer signaling. Compared to the original EVC design, an additional V. P RIORITIZATION PER PATH S EGMENT 44% improvement in latency under heavy load and a reduction NoC architectures prioritizing packets along path segments of power up to 8.2% is achieved. spanning several hops are presented in this section. This kind Another approach for setting up direct virtual links at of prioritization is normally done by selecting dedicated VCs runtime is given in [21]. Based on the current NoC state, as realized in NoC designs by [2], [20], [21]. Along the path virtual point-to-point (VIP) paths are created, allowing packets segments, the prioritized packets bypass router pipeline stages to bypass the pipeline of intermediate routers. Packets travel which results in a reduced latency. A combination of path along VIPs by using a dedicated VC, for which each router is segment prioritization and virtual, circuit-switched network pre-configured to forward the packet to a designated output. topology is presented in [22]. This network creates paths Each router port can be used by at most one VIP connection. for prioritized packets which may lead from core-to-core. In combination with prioritizing VIP packets over normal NoC Thus, this NoC could have been categorized as a core-to-core traffic, VIP connections cannot attain busy channels along their priorization architecture. Yet, as the length of the prioritized path. VIPs are set up using a simple and small bit-width setup path is not guaranteed, it is categorized as a per path segment network controlled by a root node. This network is also used optimizing NoC. A summary of the main characteristics of all to collect the monitoring data of each router. Periodically, the NoC architectures from this section is given in table II. root node checks whether an adaptation of the VIP paths is required and manages the tearing down of old and setting up A. Virtual Links assigned to VCs of new VIPs. Evaluation is done using a multicore SoC with The concept of Express Virtual Channels (EVCs) is pre- different benchmarks running on the same cores. Results show sented in [2]. At any router port the set of VCs is partitioned an average latency reduction of 44% and a power reduction between normal VCs (NVCs) and EVCs. EVCs provide vir- of 17% compared to conventional NoC. tual express lanes in the network which are used to bypass intermediate routers by skipping the router pipeline. EVCs B. Spatial Division Multiplexing are restricted to connect routers only along a single dimension A combination of a packet switched NoC and a circuit and are not allowed to turn. Focusing on dynamic EVCs, each switched NoC is proposed in [22]. Using spatial-division router can act as a source/sink of EVCs or as a bypass node. multiplexing, network resources are split between a packet- The length of each EVC can be configured in advance, allow- switching sub-network (Pnet) and a circuit-switched sub- ing dynamic adaptation to different traffic patterns. Packets network (Cnet). Configuration of the Cnet is done by a light- normally try to acquire the longest possible EVC along their weight setup-network called Snet. Processing of flits arriving pass. In case of high contention of a particular EVC, smaller at the Pnet is done in the same way as for standard packet- EVCs can be chosen. If all possible EVCs are occupied, NVCs switched NoCs. The only difference is during the routing are used. While virtual express lanes are mapped on top of a computing stage. If the Cnet part of the physical output link regular mesh topology and do not require extra wires, extra is free, the flit is moved to the Cnet. The Snet is used to control lines are needed for flow control between sinks and build the longest possible direct link to the destination node. sources of individual EVCs. For the SPLASH benchmark, the Flits traveling along the Cnet bypass the router pipeline and authors show a latency reduction of 84%, a power reduction are sent in a pipelined fashion to the destination node. At of 38% and a throughput improvement of 23%. the destination node, the flits are either transfered to the local An improved version of EVCs [2] based on a hybrid core or are handled in the same way as flits arriving at the interconnect called NOCHI is given in [20]. The EVC network Pnet. For synthetic traffic patterns, the authors showed latency is supplemented by a control plane comprised of global lines and power reduction of 45% and 22%, respectively. An area spanning all nodes in a row or column. The global lines overhead of less than 10% is mainly caused by the Snet. are used for exchanging broadcast control information and flow control messages and replace the dedicated point-to-point VI. D ISCUSSION control wires required in the initial EVCs design. They base For runtime reconfigurable manycore systems, not only the on capacitive feed-forward circuits and are extended for one standard NoC design parameters such as throughput, latency, cycle multi-broadcast abilities and collision detection with power and area requirements are relevant. Key properties node quantity determination. The original EVC flow control having high impact on these parameters are the adaptability
  • 5. TABLE II M AIN C HARACTERISTICS OF N O C S WITH P RIORITIZATION OF PATH S EGMENTS No load latency Power Area Type of virtual Configuration standard prioritized reduction overhead connection [2] 4 clock cycles 2 clock cycles 38% conrol lines arbitrary nodes in one dimension design time [20] 4 clock cycles 2 clock cycles 44% control network arbitrary nodes in one dimension runtime [21] 5 clock cycles 2 clock cycles 18% 2% core to core runtime [22] 5 clock cycles wire delay 22% < 10% core/node to node/core runtime to diverse traffic patterns, the selective prioritization of certain wire delay between nodes [14], [16], or to one clock cycle per data streams, and implementation issues. Table III summarizes hop [15]. In combination with reduced latency, bypassing of these relevant parameters for the NoC designs presented. routers also leads to energy savings. As the additional physical Concerning implementation issues, NoC designs for runtime communication links exist in parallel to the standard NoC, reconfigurable manycore systems are often restricted to the system throughput is increased, too. Another advantage of this hardware structure of and design tool limitations for Xilinx NoC category is that latency sensitive traffic is guaranteed Virtex FPGAs. Apart from few designs implemented on ASIC- to be prioritized. There is no speculation involved as in style runtime reconfigurable platforms, these FPGAs form the the link prioritization architectures presented in subsection basis for most runtime reconfigurable systems. Thus, NoC B of section IV. When focusing on runtime reconfigurable designs have to deal with their restrictions and limitations. manycore architectures, a drawback of this NoC category Column technology requirements of table III lists the hardware is their limited flexibility. Apart from [16], they are either requirement of NoC architectures [19] and [20] which hamper limited by the message length to be transmitted along the a direct realization on the Virtex FPGA platform. Implemen- additional infrastructure or by node locations. [14] is optimized tation issues that complicate but do not prevent an FPGA to transfer short control messages along the additional network realization are given in column layout anomalies of table III. and, thus, is not appropriate for data intensive semi-static data These restrictions are caused by the fact that for runtime streams. Manual insertion of long links at design time restricts reconfigurable FPGA designs, a homogeneous and regular placement of runtime exchangeable processing cores to certain system layout is desirable. Even though it is possible, routing locations in case they want to make use of these links [15]. signals through regions to be reconfigured is not advisable. Concerning adaptivity to changing traffic patterns, NoC As a result, additional, non-uniform control wires between architectures prioritizing on per hop basis or per segment basis nodes [2], physical fast paths [14], [15] or additional control provide a flexible option. The only exception is [2], where networks [20], [21], [22] hamper a smooth design flow. the architecture requires dedicated control lines for connecting Another important design feature for runtime reconfigurable source and sink of virtual links. Thus, virtual connections have manycore system is the ability to prioritize selected data to be defined at design time which reduces system flexibility streams. Column prioritization of packets of table III specifies in the same way as the manual insertion of long links for NoC whether prioritization is selectable or fixed for all packets. architectures [15]. Yet, the extended version of [2] presented As pointed out in the introduction, a universal prioritization in [20] circumvents this limitation. With the exception of [19], neglects the fact that often only a small amount of data is prioritization on per router basis neither requires any dedicated latency-sensitive [9]. NoCs that prioritize all packets along a control lines nor additional physical bypasses. As a result, route are well suited for semi-static data streams, yet small these architectures are universal applicable. Yet, they suffer control messages might even be delayed in the case they do from reduced optimization potential. Flits always have to pass not follow the main route. This is especially true for NoC at least some stages of the router pipeline at each hop which architectures with speculative forwarding such as [17], [18]. limits the achievable latency reduction. In order to reduce Whether or not a NoC guarantees to prioritize a selected pipeline depth and, thus, the latency as much as possible, data stream is given in column guaranteed prioritization of often complex speculative pipeline structures are used [17], table III. [18]. This comes on costs of area as well as power efficiency. With regard to latency, all NoC architectures achieve sig- Concerning energy efficiency, the approaches of [17], [18] nificant improvements. Yet, they feature significant differences are problematic. Both designs speculatively forwards flits and in energy consumption and area overhead. In general, NoCs have to check afterwards whether this was correct or not. This providing core-to-core prioritization tend to show the highest increases switching activity and in case of [18] may also lead area increase compared to standard mesh-topology NoC de- to congestion. In addition, speculation failure rate becomes signs. This is a result of the additional physical communication high in case of increasing network traffic [2]. For runtime structure such as a bus [14] or long-range links [15], or due to reconfigurable systems, the self-adaptive approach of [17], the additional logic for setting up virtual topologies [16]. The [18] is favorable. These architectures do not require any con- main advantage of these architecture is their near optimal com- figuration for prioritizing frequently chosen connections. Yet, munication latency for prioritized data streams. The additional the required configuration of routers in [19] has the advantages communication infrastructure bypasses the router pipeline at that a settling phase is avoided and that the communication each hop and reduces the communication delay to either the network can better be adapted to latency-critical data flows.
  • 6. TABLE III C HARACTERISTICS OF PRESENTED N O C ARCHITECTURES FOR RUNTIME RECONFIGURABLE MANYCORE DESIGNS Kind of Prioritization Guaranteed Technology Layout anomalies Remarks prioritization of packets priotitization requirements [14] additional bus selectable yes - bus in tree topology bus can handle low-bandwidth only [15] physical long links destination yes - long-links disturb flexibility restricted dependent regularity by fixed long links [16] circuit-switched logical links fixed yes - - configurable virtual topology [17] reduced pipeline fixed no - - - [18] reduced pipeline + fixed no - - - speculative forwarding [19] reduced pipeline + fixed no tri-state buffers - configurable paths speculative forwarding [2] virtual express channels fixed no - extra control lines Virtual express channels in one dimension only [20] virtual express channels fixed no capacitive feed- control network Virtual express channels forward circuits in one dimension only [21] virtual point-to-point links selectable yes - setup network - [22] circuit-switched logical links fixed yes - setup network - The VC-based NoCs with prioritization per path segment [7] L. Xin and C.-s. Choy, “A Low-latency NoC Router with Lookahead summarized in subsection A of section V suffer from the Bypass,” in IEEE Int. Symp. pn Circuits and Systems (ISCAS), 2010, pp. 3981–3984. same drawback as NoCs prioritizing individual input/output [8] A. Kumar, L.-S. Peh, and N. Jha, “Token Flow Control,” in 41st connections per router: flits have to pass at least some router IEEE/ACM Int. Symp. on Microarchitecture (MICRO-41), 2008, pp. pipeline stages, lowering the achievable latency reduction. 342–353. [9] Z. Li, J. Wu, L. Shang, R. Dick, and Y. Sun, “Latency Criticality Aware The approach of [2] and its extension in [20] also limits On-Chip Communication,” in Design, Automation & Test in Europe virtual links to one dimension. In case source and sink are Conference (DATE), 2009, pp. 1052–1057. not located in the same dimension, flits have to pass at least [10] P. T. Wolkotte, G. J. Smit, Rauwerda, and L. T. Smit, “An Energy- Efficient Reconfigurable Circuit-Switched Network-on-Chip,” in 19th three times the full router pipeline. In contrast, [21] allows Int. Parallel and Distributed Processing Symp., 2005, pp. 155a–155a. setting up virtual connections between cores or routers directly. [11] J. Chan and S. Parameswaran, “NoCOUT: NoC Topology Generation With regard to virtual connections, the approach of [16] is with Mixed Packet-switched and Point-to-Point Networks,” in Asia and South Pacific Design Automation Conference, 2008, pp. 265–270. the most flexible one, as a complete virtual topology can [12] B. Grot, J. Hestness, S. Keckler, and O. Mutlu, “Express Cube Topolo- be generated on top of the physical network. While still gies for on-Chip Interconnects,” in IEEE 15th Int. Symp. on High be configurable, this architecture enables the design of an Performance Computer Architecture (HPCA), 2009, pp. 163–174. [13] J. Kim, J. Balfour, and W. Dally, “Flattened Butterfly Topology for On- application specific infrastructure and, thus, is well suited for Chip Networks,” in 40th IEEE/ACM Int. Symp. on Microarchitecture runtime reconfigurable systems. The only drawback of this (MICRO), 2007, pp. 172–182. approach is that configuration affects an entire link. In case a [14] R. Manevich, I. Walter, I. Cidon, and A. Kolodny, “Best of Both Worlds: A Bus Enhanced NoC (BENoC),” in 3rd ACM/IEEE Int. Symp. on core sends data to two different destinations a direct point-to- Networks-on-Chip (NoCS), 2009, pp. 173–182. point connection cannot be set up. [15] U. Ogras and R. Marculescu, “Application-Specific Network-on-Chip Architecture Customization via Long-Range Link Insertion,” in Int. VII. ACKNOWLEDGEMENT Conf. on Computer-Aided Design (ICCAD), 2005, pp. 246–253. [16] M. Stensgaard and J. Sparso, “ReNoC: A Network-on-Chip Architecture This work was funded in part by the German Research with Reconfigurable Topology,” in Second ACM/IEEE Inter. Symp. on Foundation (DFG) within priority programme 1148 under Networks-on-Chip (NoCS), 2008, pp. 55–64. [17] D. Park, R. Das, C. Nicopoulos, J. Kim, N. Vijaykrishnan, R. Iyer, and grant reference Ma 1412/5. C. Das, “Design of a Dynamic Priority-Based Fast Path Architecture for On-Chip Interconnects,” in 15th IEEE Symp. on High-Performance R EFERENCES Interconnects (HOTI), 2007, pp. 15–20. [1] Bolotin, Evgeny and Cidon, Israel and Ginosar, Ran and Kolodny, [18] H. Matsutani, M. Koibuchi, H. Amano, and T. Yoshinaga, “Prediction Avinoam, “Cost considerations in network on chip,” Integration, the router: Yet another low latency on-chip router architecture,” in IEEE VLSI Jounal, vol. 38, no. 1, pp. 19–42, 2004. 15th Int. Symp. on High Performance Computer Architecture (HPCA), [2] Kumar, Amit and Peh, Li-Shiuan and Kundu, Partha and Jha, Niraj K., 2009, pp. 367–378. “Express Virtual Channels: Towards the Ideal Interconnection Fabric,” [19] G. Michelogiannakis, D. Pnevmatikatos, and M. Katevenis, “Approach- in 4th Int. Symp. on Computer Architecture, 2007, pp. 150–161. ing Ideal NoC Latency with Pre-Configured Routes,” in First Int. Symp. [3] R. Mullins, A. West, and S. Moore, “Low-Latency Virtual-Channel on Networks-on-Chip (NOCS), 2007, pp. 153–162. Routers for On-Chip Networks,” in 31st Int.Symp. on Computer Ar- [20] T. Krishna, A. Kumar, P. Chiang, M. Erez, and L.-S. Peh, “NoC with chitecture, 2004, pp. 188–197. Near-Ideal Express Virtual Channels Using Global-Line Communica- [4] L.-S. Peh and W. Dally, “A delay model and speculative architecture tion,” in 16th IEEE Symp. on High Performance Interconnects (HOTI), for pipelined routers,” in 7th Int. Symp. on High-Performance Computer 2008, pp. 11–20. Architecture (HPCA), 2001, pp. 255–266. [21] M. Modarressi, A. Tavakkol, and H. Sarbazi-Azad, “Virtual Point-to- [5] K. Kim, S.-J. Lee, K. Lee, and H.-J. Yoo, “An Arbitration Look-Ahead Point Connections for NoCs,” IEEE Trans. on Computer-Aided Design Scheme for Reducing End-to-End Latency in Networks on Chip,” in of Integrated Circuits and Systems, vol. 29, no. 6, pp. 855–868, 2010. IEEE Int. Symp. on Circuits and Systems (ISCAS), 2005, pp. 2357–2360. [22] M. Modarressi, H. Sarbazi-Azad, and M. Arjomand, “A Hybrid Packet- [6] A. Kodi, A. Louri, and J. Wang, “Design of energy-efficient channel Circuit Switched on-Chip Network Based on SDM,” in Design, Automa- buffers with router bypassing for network-on-chips (NoCs),” in Quality tion & Test in Europe Conference (DATE), 2009, pp. 566–569. of Electronic Design (ISQED), 2009, pp. 826–832.