Arteris NoC SoC Interconnect presentation given by Jonah Probell at ARM Technology Conference 9-11 Nov 2010. Explains how traditional AXI fabrics require huge numbers of wires and leads to routing congestion, and how network on chip interconnects address routing congestion by allowing fewer wires. Explains the basics of NoC packetization and serialization.
2. P&R congestion is the focus of EDA "...upstream tools need to be claivoyant deep into the layout." "The worst crises are when you're deep into the layout and realize that my floorplan's no good. So how do you avoid that? Well what's needed are claivoyant tools. That is a chain of steps where each step already knows a little bit about the changes downstream." "The synthesizer can, this year, avoid congestion; and congestion is really the killer of schedules.“ -Aart DeGeus, Synopsys Synposium 2010
3. Interconnects logically Interconnect The interconnect transports AXI transactions between masters and slaves. The means of transportation are not defined by the AXI spec. master master master master master slave slave slave slave slave AXI AXI
4. Interconnect physically The interconnect lives in the hallways between IP cores. The width of the links affects the compactness of the die.
5. 1. Growing interface complexity Address Write data Read data Write address Write data Read data Read address Write response 32 Data width Data width 32 Data width Data width 32 Control A few Wires Wires Control A few Control A few Control A few Control A few A few AHB AXI Signal Signal Data width 32 64 128 AHB signals 113 177 305 AXI signals 204 272 408
7. 3. Relative wire cost growing Transistor sizes shrink faster than wire widths. 286 CPU (1982) 69 mm 2 Atom N450 (2010) 66 mm 2 Chips are, on average, the same size as ever.
8.
9. Packetizing AXI to transport transactions Read Address Read Data Write Data Write Address Write Response Request from master Response to master Request Packet Response Packet packetize depacketize
10. Packetizing AXI to transport transactions Read Address Read Data Write Data Write Address Write Response Request to slave Response to master Request Packet Response Packet depacketize packetize
11. Serializing With a packetized protocol, serializing data simply requires a register and a mux. Serializing packets is much easier than serializing the AXI interface protocol.
12. Throughput and wires header data data header data data data data header data data data data header header data data (a) (b) (c) (d) Link width = data width + header width Header penalty = 0 Link width = header width Header penalty = 1 cycle per transaction Link width < data width Header penalty > 1 cycle per transaction Link width = data width Header penalty = 1 cycle per transaction
13. Selection of link width L2 DDR peripherals Place cores with high communication throughput and low latency requirements near each other. Use zero header penalty links between such cores. Use narrow links for long paths to low throughput peripherals. This minimizes the number of long wires for P&R
14. Experimental packetized link width results obtained with Arteris FlexNoC packetized interconnect generator Data width 32 64 128 AHB signals 113 177 305 AXI signals 204 272 408 Packets with 0 penalty cycles 146 218 362 Packets with 1 penalty cycle 84 156 300
16. Summary Routing congestion is the problem of the decade for chip implementation. AXI is expensive in wires. Packetizing and serializing transaction data effectively reduces routing congestion.
Place & route wire routing congestion is the problem of the decade for the EDA industry.
An interconnect, done right, is a black box that simply allows masters to perform transactions with slaves without consideration of the internal topology.
Narrower interconnects allow a smaller chip floorplan.
AXI has separate channels for write address, write data, read address, read data, and write response. This allows reads to pass writes, which improves the performance of CPUs accessing caches. However, AXI requires those extra wires everywhere else in the chip, too.
The average number of IP cores in chip designs increases each year. Each year a new record is reached. The largest that I have seen is about 100.
Wire widths and average lengths are shrinking. However, not as fast as transistors are shrinking. With more logic and relatively fewer wires in the same square millimeter, routing congestion is increasing. The die size and resulting lengths of long wires are the same as ever. This makes wires larger and larger relative to gates and as a portion of die size.
Increasing congestion costs in: * silicon area (floorplan compactness) * manufacturing cost (metal layers, vias, and reliability) * time to market (timing closure and ECOs)
Packetizing is a necessary first step in our approach to reducing wire congestion. Address and control signals are encoded in a header and transported on the same generic link as the data. Link wires are untyped, meaning that they can carry header bits in one cycle and data bits in another.
A narrow link transfers a wider data elements serially over multiple cycles. A wide link transmits multiple narrower data elements simultaneously in parallel.
The typical view of packetized data transmission A configuration that uses fewer long wires where less throughput is required A configuration that compromises on wire savings in order to ensure that single cycle transactions have less overhead A configuration that ensure maximum throughput and minimum latency between those IPs where it is necessary
Configure narrow links for the long wires around the chip to relatively low bandwidth peripherals such as USB, Flash, I2C and GPIO. Configure links with low header penalty for high performance connections such as video and graphics cores. Configure maximum throughput, minimum latency, zero header penalty links between CPUs and caches.
The number of wires in a physical link using AXI, AHB, or Arteris packet connections is shown. Remember, a packet based link with zero header penalty gives the full throughput of the data width * the clock speed but with fewer pins A packet based link with 1 header cycle for each transaction uses fewer wires than AHB and with the benefits of the AXI protocol
This is a small piece of a layout congestion diagram from a chip design done twice, once with an interconnect using AXI based links and once with an Arteris packet based interconnect. The same floorplan was used in both cases.
An ounce of prevention is worth a pound of cure. Solving a problem early in a process takes much less effort that solving it later. Physical synthesis uses placement awareness to reduce the average wire length A serialized interconnect reduces the total number of wires in the first place