SlideShare une entreprise Scribd logo
1  sur  180
Télécharger pour lire hors ligne
Soft Polynomials (I) Pvt. Ltd.
PCI EXPRESS
SYSTEM
ARCHITECTURE
1Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
• Architectural Perspective
• Architecture Overview
• Address Spaces & Transaction Routing
• Packet Based Transaction
• ACK / NAK Protocol
• QoS / TCs / VCs and Arbitration
• Flow Control
• Interrupts
• Error Detection and Handling
• Physical Layer Logic
• Electrical Physical Layer
Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd. 2
Contents
ARCHITECTURAL PERSPECTIVE
3Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
ARCHITECTURAL PERSPECTIVE
Introduction to PCI Express
 PCI Express is the third generation high performance I/O bus
used to interconnect peripheral devices in applications such as
computing and communication platforms.
 It has the application in the mobile, desktop, workstation,
server, embedded computing and communication platforms.
4Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
ARCHITECTURAL PERSPECTIVE
The Role of the Original PCI Solution
Don’t Throw Away What is Good !!! Keep It !!!
 Same usage model and load-store communication model.
 Support familiar transactions.
 Memory read/write, IO read/write and configuration read/write transactions.
 The memory, IO and configuration address space model is the same as PCI and PCI-X
address spaces.
 By maintaining the address space model, existing OSs and driver software will run in
a PCI Express system without any modifications.
 In other words, PCI Express is software backwards compatible with PCI and PCI-X
systems.
 It supports chip-to-chip interconnect and board-to-board interconnect.
5Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
ARCHITECTURAL PERSPECTIVE
Make Improvements for the Future
 It Implements a serial, point-to-point type interconnect for
communication between two devices.
 Multiple PCI Express devices are interconnected via the use of
switches.
 PCI Express transmission and reception data rate is 2.5
Gbits/sec.
 A packet-based communication protocol.
 Hot Plug/Hot Swap
6Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
ARCHITECTURAL PERSPECTIVE
Looking into the Future
 In the future, PCI Express communication frequencies are
expected to double and quadruple to 5 Gbits/sec.
Taking advantage of these frequencies will require Physical
Layer re-design of a device with no changes necessary to the
higher layers of the device design.
7Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
ARCHITECTURAL PERSPECTIVE
Bus Performances and Number of Slots Compared
Bus Type Clock Frequency Peak Bandwidth* Number of Card
Slots per Bus
PCI 32 – bit 33 MHz 133 Mbytes / sec 4 – 5
PCI 32 – bit 66 MHz 266 Mbytes / sec 1 – 2
PCI – X 32 – bit 66 MHz 266 Mbytes / sec 4
PCI – X 32 – bit 133 MHz 533 Mbytes / sec 1 – 2
PCI – X 32 – bit 266 MHz effective 1066 Mbytes / sec 1
PCI – X 32 – bit 533 MHz effective 2131 Mbytes / sec 1
* Double all these bandwidth numbers for 64 – bit bus implementations
8Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
ARCHITECTURAL PERSPECTIVE
PCI Express Aggregate Throughput
 A PCI Express interconnect that connects two devices together is
referred to as a Link.
 A Link consists of either x1, x2, x3, x4, x8, x12, x16 or x32 signal pairs in
each direction.
 These signals are referred to as Lanes.
 To support a greater degree of robustness during data transmission
and reception, each byte of data transmitted is converted into a 10 –
bit code (via an 8b /10b encoder in the transmission device).
The result is 25% additional overhead to transmit a byte of data.
9Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
PCI Express Aggregate Throughput
 To obtain the aggregate bandwidth numbers in table multiply 2.5
Gbits/sec by 2 (for each direction), then multiply by number of Lanes, and
finally divide by 10 – bits per Byte (to account for the 8 – to – 10 bit
encoding)
PCI Express
Link Width
x1 x2 x4 x8 x12 x16 x32
Aggregate
Band – Width
(Gbytes/sec)
0.5 1 2 4 6 8 16
Table : PCI Express Aggregate Throughput for Various Link Widths
10Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
ARCHITECTURAL PERSPECTIVE
Performance Per Pin Compared
11
Comparison of Performance Per Pin for Various Buses
 In Figure, the first 7 bars are
associated with PCI and PCI – X
buses where we assume 84 pins per
device.
 This includes 46 signal pins,
interrupt and power management
pins, error pins and the remainder
are power and ground pins.
The last bar associated with a x8
PCI Express Link assumes 40 pins
per device which include 32 signal
lines (8 differential pains per
direction) and the rest are power
and ground pins.
Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
ARCHITECTURAL PERSPECTIVE
I/O Bus Architecture Perspective
33 MHz PCI Bus Based System
 The PCI bus clock is 33 MHz.
 The address bus width is 32 bits (4GB memory address space), although
PCI optionally supports 64-bit address bus.
 The data bus width is implemented as either 32-bits or 64-bits depending
on bus performance requirement.
 The address and data bus signals are multiplexed on thye same pins (AD
bus) to reduce pin count.
 PCI supports 12 transaction types.
 A PCI master device implements a minimum of 49 signals.
12Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
I/O Bus Architecture Perspective
33 MHz PCI Bus Based System
33 MHz PCI Bus Based Platform
13Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
ARCHITECTURAL PERSPECTIVE
I/O Bus Architecture Perspective
33 MHz PCI Based System Showing Implementation of a PCI-to-PCI Bridge
14
33 MHz PCI Bus Based System
Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
15
Typical PCI Burst Memory Read Bus Cycle
ARCHITECTURAL PERSPECTIVE
Bus / Machine Cycles
Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
PCI Transaction Model - Programmed IO
PCI Transaction Model
16Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
ARCHITECTURAL PERSPECTIVE
I/O Bus Architecture Perspective
PCI Transaction Model - Programmed IO
17
 CPU communicates with a PCI
peripheral such as an Ethernet device.
 Software commands the CPU to
initiate a memory.
 The North bridge arbitrates for use of
the PCI bus and when it wins
ownership of the bus generates a PCI
memory or IO read/write bus cycle.
 First clock of this bus cycle (known as
the address phase), all target devices
decode the address.
Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
PCI Transaction Model - Programmed IO
One target decodes the address and
claims the transaction.
The master communicates with the
claiming target.
 Data is transferred between master and
target in subsequent clocks after the address
phase of the bus cycle.
 Either 4 bytes or 8 bytes of data are
transferred per clock tick depending on the
PCI bus width.
 The bus cycle is referred to as a burst bus
cycle if data
is transferred back-to-back between master
and target during multiple data phases of
that bus cycle.
18Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
PCI EXPRESS
ARCHITECTURE OVERVIEW
19Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
20
Architecture Overview
PCI Cards - Slots
Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
21
Architecture Overview
Slot Pin Allotment
Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
22
Architecture Overview
Slot Pin Allotment
Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
23Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
ARCHITECTURE OVERVIEW
PCI Express Transactions
 Communication involves the transmission and reception of
packets called Transaction Layer packets (TLPs).
 PCI Express transactions can be grouped into four categories:
1) Memory
2) IO
3) Configuration
4) Message Transactions
 Transactions are defined as a series of one or more packet
transmissions required to complete an information transfer
between a requester and a completer.
24Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
PCI Express Transactions
Transaction Type Non-Posted or Posted
Memory Read Non-Posted
Memory Write Posted
Memory Read Lock Non-Posted
IO Read Non-Posted
IO Write Non-Posted
Configuration Read (Type 0 and Type 1) Non-Posted
Configuration Write (Type 0 and Type 1) Non-Posted
Message Posted
PCI Express Non-Posted and Posted Transactions
ARCHITECTURE OVERVIEW
25Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
ARCHITECTURE OVERVIEW
PCI Express Transactions
 For Non-posted transactions, a requester transmits a TLP
request packet to a completer.
 At a later time, the completer returns a TLP completion
packet back to the requester.
 The purpose of the completion TLP is to confirm to the
requester that the completer has received the request TLP.
Non – posted Transactions
26Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
ARCHITECTURE OVERVIEW
PCI Express Transactions
 For Posted transactions, a requester transmits a TLP request
packet to a completer.
 The completer however does NOT return a completion TLP
back to the requester.
 Posted transactions are optimized for best performance in
completing the transaction at the expense of the requester not
having knowledge of successful reception of the request by the
completer.
Posted Transactions
27Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
ARCHITECTURE OVERVIEW
PCI Express Transaction Protocol
PCI Express TLP Packet Types 28Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
ARCHITECTURE OVERVIEW
Non – Posted Read Transactions
 Requesters may be root complex or endpoint devices
(endpoints do not initiate configuration read/write requests
however).
 The request TLP is routed through the fabric of switches using
information in the header portion of the TLP.
 The packet makes its way to a targeted completer.
 The completer can be a root complex, switches, bridges or
endpoints.
29Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
ARCHITECTURE OVERVIEW
Non – Posted Read Transactions
 The completer can return up to 4 KBytes of data per CplD
packet.
 The completion packet contains routing information necessary
to route the packet back to the requester.
 If a completer is unable to obtain requested data as a result of
an error, it returns a completion packet without data (Cpl) and
an error status indication.
The requester determines how to handle the error at the
software layer.
30Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
ARCHITECTURE OVERVIEW
Non – Posted Read Transactions
Non-Posted Read Transaction Protocol
31Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
ARCHITECTURE OVERVIEW
Non – Posted Read Transactions for
Locked Requests
 The requester can only be a root complex which initiates a
locked request on the behalf of the CPU.
 Endpoints are not allowed to initiate locked requests.
 The completer can only be a legacy endpoint.
 The entire path from root complex to the endpoint is locked
including the ingress and egress port of switches in the
pathway.
32Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
ARCHITECTURE OVERVIEW
Non – Posted Read Transactions for
Locked Requests
Non-Posted Locked Read Transaction Protocol
33Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
ARCHITECTURE OVERVIEW
Non – Posted Read Transactions for
Locked Requests
 The completer creates one or more locked completion TLP with
data (CplDLk) along with a completion status.
 Requesters uses a tag field in the completion to associate it with a
request TLP of the same tag value it transmitted earlier.
 Use of a tag in the request and completion TLPs allows a
requester to manage multiple outstanding transactions.
 If the completer is unable to obtain the requested data as a
result of an error, it returns a completion packet without data
(CplLk) and an error status indication within the packet.
34Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
ARCHITECTURE OVERVIEW
Non – Posted Read Transactions for
Locked Requests
 The requester who receives the error notification via the CplLk
TLP must assume that atomicity of the lock is no longer
guaranteed and thus determine how to handle the error at the
software layer.
 The path from requester to completer remains locked until the
requester at a later time transmits an unlock message to the
completer.
 The path and ingress/egress ports of a switch that the unlock
message passes through are unlocked.
35Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
ARCHITECTURE OVERVIEW
Non – Posted Write Transactions
 Non-posted write request TLPs include IO write request (IOWr),
configuration write request type 0 or type 1 (CfgWr0, CfgWr1)
TLPs.
 Memory write request and message requests are posted
requests.
Requesters may be a root complex or endpoint device (though
not for configuration write requests).
 The completer creates a single completion packet without data
(Cpl) to confirm reception of the write request.
36Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
ARCHITECTURE OVERVIEW
Non – Posted Write Transactions
Non-Posted Write Transaction Protocol
37Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
ARCHITECTURE OVERVIEW
Non – Posted Write Transactions
 The requester gets confirmation notification that the write
request did make its way successfully to the completer.
 If the completer is unable to successfully write the data in the
request to the final destination or if the write request packet
reaches the completer in error, then it returns a completion packet
without data (Cpl) but with an error status indication.
 The requester who receives the error notification via the Cpl
TLP determines how to handle the error at the software layer.
38Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
ARCHITECTURE OVERVIEW
Posted Memory Write Transactions
 If the write request is received by the completer in error, or is
unable to write the posted write data to the final destination due
to an internal error, the requester is not informed via the
hardware protocol.
 The completer could log an error and generate an error
message notification to the root complex. Error handling software
manages the error.
39Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
ARCHITECTURE OVERVIEW
Posted Memory Write Transactions
Posted Memory Write Transaction Protocol
40Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
ARCHITECTURE OVERVIEW
Posted Message Transactions
 There are two categories of message request TLPs, Msg and MsgD.
 Some message requests propagate from requester to completer, some are
broadcast requests from the root complex to all endpoints, some are
transmitted by an endpoint to the root complex.
 The completer accepts any data that may be contained in the packet (if the
packet is MsgD) and/or performs the task specified by the message.
 Message request support eliminates the need for side-band signals in a PCI
Express system.
 They are used for PCI style legacy interrupt signaling, power management
protocol, error signaling, unlocking a path in the PCI Express fabric, slot power
support, hot plug protocol, and vender defined purposes.
41Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
ARCHITECTURE OVERVIEW
Posted Message Transactions
Posted Message Transaction Protocol
42Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
ARCHITECTURE OVERVIEW
Memory Read Originated by CPU,
Targeting an Endpoint
Non-Posted Memory Read Originated by CPU and Targeting an Endpoint
43Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
ADDRESS SPACES & TRANSACTIONS
ROUTING
44Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Introduction
 Unlike shared-bus architectures such as PCI and PCI-X, where
traffic is visible to each device and routing is mainly a concern of
bridges, PCI Express devices are dependent on each other to accept
traffic or forward it in the direction of the ultimate recipient.
 As traffic arrives at the inbound side of a link interface (called the
ingress port), the device checks for errors, then makes one of three
decisions:
1) Accept the traffic and use it internally.
2) Forward the traffic to the appropriate outbound (egress) port.
3) Reject the traffic because it is neither the intended target nor an interface to it
(note that there are also other reasons why traffic may be rejected)
45Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Introduction
Multi-Port PCI Express Devices Have Routing Responsibilities
46Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Receivers Check For Three
Types of Link Traffic
 Assuming a link is fully operational, the physical layer receiver interface of each device is
prepared to monitor the logical idle condition and detect the arrival of the three types of
link traffic:
1) Ordered Sets
2) DLLPs
3) TLPs
 Using control (K) symbols which accompany the traffic to determine framing boundaries
and traffic type, PCI Express devices then make a distinction between traffic which is local
to the link vs. traffic which may require routing to other links (e.g. TLPs).
 Local link traffic, which includes Ordered Sets and Data Link Layer Packets (DLLPs),
isn’t forwarded and carries no routing information.
 Transaction Layer Packets (TLPs) can and do move from link to link, using routing
information contained in the packet headers.
47Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Multi-port Devices Assume
the Routing Burden
 It should be apparent in Figure that devices with multiple PCI
Express ports are responsible for handling their own traffic as well as
forwarding other traffic between ingress ports and any enabled
egress ports.
Also note that while peer-peer transaction support is required of
switches, it is optional for a multi-port Root Complex.
It is up to the system designer to account for peer-to-peer traffic
when selecting devices and laying out a motherboard.
48Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Endpoints Have Limited
Routing Responsibilities
 It should also be apparent in Figure that endpoint devices have
a single link interface and lack the ability to route inbound traffic
to other links.
For this reason, and because they don’t reside on shared busses,
endpoints never expect to see ingress port traffic which is not
intended for them (this is different than shared-bus PCI(X), where
devices commonly decode addresses and commands not targeting
them).
Endpoint routing is limited to accepting or rejecting transactions
presented to them.
49Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
System Routing Strategy Is
Programmed
 Before transactions can be generated by a requester, accepted by the completer,
and forwarded by any devices in the path between the two, all devices must be
configured to enforce the system transaction routing scheme.
Routing is based on traffic type, system memory and IO address assignments, etc.
In keeping with PCI plug-and-play configuration methods, each PCI express
device is discovered, memory and IO address resources are assigned to them, and
switch/bridge devices are programmed to forward transactions on their behalf.
Once routing is programmed, bus mastering and target address decoding are
enabled.
Thereafter, devices are prepared to generate, accept, forward, or reject
transactions as necessary.
50Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Two Types of Local Link
Traffic
 Local traffic occurs between the transmit interface of one
device and the receive interface of its neighbour for the purpose
of managing the link itself.
 This traffic is never forwarded or flow controlled; when sent, it
must be accepted.
 Local traffic is further classified as Ordered Sets exchanged
between the Physical Layers of two devices on a link or Data
Link Layer packets (DLLPs) exchanged between the Data Link
Layers of the two devices.
51Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Ordered Sets
 These are sent by each physical layer transmitter to the
physical layer of the corresponding receiver to initiate link
training, compensate for clock tolerance, or transition a link to
and from the Electrical Idle state.
As indicated in Table, there are five types of Ordered Sets.
Each ordered set is constructed of 10-bit control (K) symbols
that are created within the physical layer.
52Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Ordered Sets
Ordered Set Types
53Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Ordered Sets
PCI Express Link Local Traffic: Ordered Sets
54Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
P C I :
Please
Calm down
Immediately
We are taking a
break
55Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
PACKET – BASED TRANSACTIONS
1Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Introduction to the Packet-
Based Protocol
 With the exception of the logical idle indication and physical layer Ordered
Sets, all information moves across an active PCI Express link in fundamental
chunks called packets which are comprised of 10 bit control (K) and data (D)
symbols.
The two major classes of packets exchanged between two PCI Express devices
are high level Transaction Layer Packets (TLPs), and low-level link maintenance
packets called Data Link Layer Packets (DLLPs).
Collectively, the various TLPs and DLLPs allow two devices to perform
memory, IO, and Configuration Space transactions reliably and use messages to
initiate power management events, generate interrupts, report errors, etc.

Figure depicts TLPs and DLLPs on a PCI Express link.
2Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Why Use A Packet-Based
Transaction Protocol
TLP And DLLP Packets
3Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Why Use A Packet-Based
Transaction Protocol
Packet Formats Are Well Defined
 Each PCI Express packet has a known size and format, and the packet header-
-positioned at the beginning of each DLLP and TLP packet-- indicates the packet
type and presence of any optional fields.
The size of each packet field is either fixed or defined by the packet type.
The size of any data payload is conveyed in the TLP header Length field.
Once a transfer commences, there are no early transaction terminations by the
recipient.
This structured packet format makes it possible to insert additional information
into the packet into prescribed locations, including framing symbols, CRC, and a
packet sequence number (TLPs only).
4Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Framing Symbols Indicate
Packet Boundaries
 Each TLP and DLLP packet sent is framed with a Start and End
control symbol, clearly defining the packet boundaries to the
receiver.
Note that the Start and End control (K) symbols appended to
packets by the transmitting device are 10 bits each.
A PCI Express receiver must properly decode a complete 10 bit
symbol before concluding link activity is beginning or ending.
Unexpected or unrecognized control symbols are handled as
errors.
5Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Transaction Layer Packets
 High-level transactions originate at the device core of the
transmitting device and terminate at the core of the receiving
device.
The Transaction Layer is the starting point in the assembly of
outbound Transaction Layer Packets (TLPs), and the end point for
disassembly of inbound TLPs at the receiver.
Along the way, the Data Link Layer and Physical Layer of each
device contribute to the packet assembly and disassembly.
6Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Transaction Layer Packets
PCI Express Layered Protocol And TLP Assembly/Disassembly
7Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
CRC Protects Entire
Packet
 Unlike the side-band parity signals used by PCI devices during the address and
each data phase of a transaction, the in-band 16-bit or 32-bit PCI Express CRC
value “protects” the entire packet (other than framing symbols).
In addition to CRC, TLP packets also have a packet sequence number
appended to them by the transmitter so that if an error is detected at the
receiver, the specific packet(s) which were received in error may be resent.
The transmitter maintains a copy of each TLP sent in a Retry Buffer until it is
checked and acknowledged by the receiver.
This TLP acknowledgement mechanism (sometimes referred to as the Ack/Nak
protocol) forms the basis of link-level TLP error correction and is very important
in deep topologies where devices may be many links away from the host in the
event an error occurs and CPU intervention would otherwise be needed.
8Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
ACK / NAK PROTOCOL
9Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Introduction
 ‘Reliable’ transport of TLPs from one device to another device
across the Link.
The use of ACK DLLPs to confirm reception of TLPs and the use
of NAK DLLPs to indicate error reception of TLPs is explained.
10Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Reliable Transport of
TLPs Across Each Link
 The function of the Data Link Layer (shown in Figure 5-1 on
page 210) is two fold:
• ‘Reliable’ transport of TLPs from one device to another device across the
Link.
• The receiver’s Transaction Layer should receive TLPs in the same order that
the transmitter sent them. The Data Link Layer must preserve this order
despite any occurrence of errors that require TLPs to be replayed (retried).
11Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Reliable Transport of
TLPs Across Each Link
Data Link Layer
12Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Reliable Transport of
TLPs Across Each Link
Overview of the ACK/NAK Protocol
13Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Elements of the ACK/NAK
Protocol
Packet order is maintained by the transmitter’s and
receiver’s Transaction Layer.
Elements of the ACK/NAK Protocol 14Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
QOS / TCS / VCS AND
ARBITRATION
15Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Quality of Service
 Quality of Service (QoS) is a generic term that normally refers to
the ability of a network or other entity (in our case, PCI Express) to
provide predictable latency and bandwidth.
 Note that QoS can only be supported when the system and
device-specific software is PCI Express aware.
 QoS can involve many elements of performance including:
1) Transmission rate
2) Effective Bandwidth
3) Latency
4) Error rate
5) Other parameters that affect performance
16Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Quality of Service
 Several features of PCI Express architecture provide the
mechanisms that make QoS achievable.
The PCI Express features that support QoS include:
• Traffic Classes (TCs)
• Virtual Channels (VCs)
• Port Arbitration
• Virtual Channel Arbitration
• Link Flow Control
 PCI Express uses these features to support two general classes of
transactions that can benefit from the PCI Express implementation
of QoS. 17Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Quality of Service
Isochronous Transactions
 These transactions require a constant bus bandwidth at
regular intervals along with guaranteed latency.
 Isochronous transactions are most often used when a
synchronous connection is required between two devices.
Iso (same) + chronous (time)
18Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Quality of Service
Asynchronous Transactions
 This class of transactions involves a wide variety of applications that
have widely varying requirements for bandwidth and latency.
QoS can provide the more demanding applications (those
requiring higher bandwidth and shorter latencies) with higher priority
than the less demanding applications.
In this way, software can establish a hierarchy of traffic classes for
transactions that permits differentiation of transaction priority based
on their requirements.
The specification refers to this capability as differentiated services.
19Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Isochronous
Transaction Support
Synchronous Versus Isochronous Transactions
Example Application of Isochronous Transaction
20Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Isochronous Transaction
Support
Isochronous Transaction Management
 Management of an isochronous communications channel is based
on a Traffic Class (TC) value and an associated Virtual Channel
(VC) number that software assigns during initialization.
Hardware components including the Requester of a transaction
and all devices in the path between the requester and completer
are configured to transport the isochronous transactions from link
to link via a hi-priority virtual channel.
The requester initiates isochronous transactions that include a TC
value representing the desired QoS.
21Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Isochronous Transaction Management
 The Requester injects isochronous packets into the fabric at the required rate
(service interval), and all devices in the path between the Requester and
Completer must be configured to support the transport of the isochronous
transactions at the specified interval.
Any intermediate device along the path must convert the TC to the associated
VC used to control transaction arbitration.
This arbitration results in the desired bandwidth and latency for transactions
with the assigned TC.
Note that the TC value remains constant for a given transaction while the VC
number may change from link to link.
22Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Isochronous Transaction
Support
Differentiated Services
 Various types of asynchronous traffic (all traffic other than
isochronous) have different priority from the system perspective.
 For example, ethernet traffic requires higher priority (smaller
latencies) than mass storage transactions.
 PCI Express software can establish different TC values and
associated virtual channels and can set up the communications paths
to ensure different delivery policies are established as required.
 Note that the specification does not define specific methods for
identifying delivery requirements or the policies to be used when
setting up differentiated services. 23Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Perspective on QOS/TC/VC
and Arbitration
Traffic Classes and Virtual Channels
 During initialization a PCI Express device-driver
communicates the levels of QoS that it desires for its
transactions, and the operating system returns TC values that
correspond to the QoS requested.
 The TC value ultimately determines the relative priority of a
given transaction as it traverses the PCI Express fabric.
24Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
FLOW CONTROL
25Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
FLOW CONTROL
Introduction
 Flow control guarantees that transmitters will never send
Transaction Layer Packets (TLPs) that the receiver can’t
accept.
 This prevents receive buffer over-runs and eliminates the
need for inefficient disconnects, retries, and wait-states on the
link.
26Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Flow Control Concept
 Before a transaction packet can be sent across a link to the receiving port, the
transmitting port must verify that the receiving port has sufficient buffer space to
accept the transaction to be sent.
 If the transaction is rejected due to insufficient buffer space, the transaction is
resent (retried) until the transaction completes.
 This procedure can severely reduce the efficiency of a bus, by wasting bus
bandwidth when other transactions are ready to be sent.
 The Flow Control mechanism would be ineffective, if only one transaction
stream was pending transmission across a link.
 PCI Express improves link efficiency by implementing multiple flow-control
buffers for separate transaction streams.
27Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Flow Control Concept
 If the Flow Control buffer for one VC is full, the transmitter can advance to
another VC buffer and send transactions associated with it.
 The link Flow Control mechanism uses a credit-based mechanism that allows
the transmitting port to check buffer space availability at the receiving port.
 During initialization each receiver reports the size of its receive buffers (in
Flow Control credits) to the port at the opposite end of the link.
 The receiving port continues to update the transmitting port regularly by
transmitting the number of credits that have been freed up.
 This is accomplished via Flow Control DLLPs.
28Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Flow Control Concept
Location of Flow Control Logic
29Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
INTERRUPTS
30Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Two Methods of Interrupt
Delivery
 When a native PCI Express function does depend upon delivering
interrupts to call its device driver, Message Signaled Interrupts
(MSI) must be used.
 However, in the event that a device connecting to a PCI Express
link cannot use MSIs (i.e., legacy devices), an alternate mechanism
is defined.
31Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Two Methods of Interrupt
Delivery
Native PCI Express Interrupt Delivery
 A Message Signalled Interrupt is not a PCI Express Message,
instead it is simply a Memory Write transaction.
 A memory write associated with an MSI can only be
distinguished from other memory writes by the address locations
they target, which are reserved by the system for Interrupt
delivery.
32Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Two Methods of
Interrupt Delivery
Legacy PCI Interrupt Delivery
 Legacy functions use one of the interrupt lines to signal an
interrupt.
 An INTx# signal is asserted to request interrupt service and
deasserted when the interrupt service accesses a device-specific
register, thereby indicating the interrupt is being serviced.
 PCI Express defines in-band messages that act as virtual INTx#
wires, which target the interrupt controller located typically
within the Root Complex.
33Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Two Methods of
Interrupt Delivery
Native PCI Express and Legacy PCI Interrupt Delivery
34Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Message Signaled
Interrupts
 Message Signaled Interrupts (MSIs) are delivered to the Root
Complex via memory write transactions.
The MSI Capability register provides all the information that the
device requires to signal MSIs.
This register is set up by configuration software and includes the
following information:
• Target memory address
• Data Value to be written to the specified address location
• The number of messages that can be encoded into the data
35Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Message Signaled
Interrupts
64-bit MSI Capability Register Format
36Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Message Signaled
Interrupts
32-bit MSI Capability Register Set Format
37Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
ERROR DETECTION AND
HANDLING
38Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
ERROR DETECTION AND
HANDLING
Background
 The original PCI bus implementation provides for basic parity
checks on each transaction as it passes between two devices
residing on the same bus.
 The PCI architecture provides a method for reporting the
following types of errors:
• data parity errors
• data parity errors during multicast transactions (special cycles)
• address and command parity errors
• other types of errors
39Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Background
 Errors reported via PERR# are considered potentially recoverable.
 How the errors reported via PERR# are handled is left up to the
implementer.
 Error handling may involve only hardware, device-specific software, or
system software.
 Errors signaled via SERR# are reported to the system and handled by
system software.
 PCI-X uses the same error reporting signals as PCI, but defines specific error
handling requirements depending on whether device-specific error handling
software is present.
40Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Introduction to PCI
Express Error Management
PCI Express defines a variety of mechanisms used for
checking errors, reporting those errors and identifying the
appropriate hardware and software elements for handling
these errors.
41Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Introduction to PCI Express
Error Management
PCI Express Error Checking Mechanisms
Each layer of the PCI Express interface includes error checking capability as
described in the following sections.
42Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
PCI Express Error Checking
Mechanisms
Transaction Layer Errors
 The transaction layer checks are performed only by the Requestor and
Completer.
Packets traversing switches do not perform any transaction layer checks.
Checks performed at the transaction layer include:
• ECRC check failure (optional check based on end-to-end CRC)
• Malformed TLP (error in packet format)
• Completion Time-outs during split transactions
• Flow Control Protocol errors (optional)
• Unsupported Requests
• Data Corruption (reported as a poisoned packet)
• Completer Abort (optional)
• Unexpected Completion (completion does not match any Request pending completion)
• Receiver Overflow (optional check) 43Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
PCI Express Error Checking
Mechanisms
Data Link Layer Errors
 Link layer error checks occur within a device involved in delivering
the transaction between the requester and completer functions.
This includes the requesting device, intermediate switches, and the
completing device.
Checks performed at the link layer include:
• LCRC check failure for TLPs
• Sequence Number check for TLP s
• LCRC check failure for DLLPs
• Replay Time-out
• Replay Number Rollover
• Data Link Layer Protocol errors
44Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
PCI Express Error Checking
Mechanisms
Physical Layer Errors
 Physical layer error checks are also performed by all devices
involved in delivering the transaction, including the requesting
device, intermediate switches, and the completing device.
Checks performed at the physical layer include:
• Receiver errors (optional)
• Training errors (optional)
45Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Error Reporting
Mechanisms
 PCI Express provides three mechanisms for establishing the error
reporting policy.
 These mechanisms are controlled and reported through
configuration registers mapped into three distinct regions of
configuration space.
• PCI-compatible Registers (required) —
1) This error reporting mechanism provides backward compatibility with existing
PCI compatible software and is enabled via the PCI configuration Command
Register.
2) This approach requires that PCI Express errors be mapped to PCI compatible
error registers. 46Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Error Reporting
Mechanisms
• PCI Express Capability Registers (required) —
1) This mechanism is available only to software that has knowledge of PCI
Express.
2) This required error reporting is enabled via the PCI Express Device Control
Register mapped within PCI-compatible configuration space.
• PCI Express Advanced Error Reporting Registers (optional) —
1) This mechanism involves registers mapped into the extended configuration
address space.
2) PCI Express compatible software enables error reporting for individual errors
via the Error Mask Register.
47Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Error Reporting
Mechanisms
Location of PCI Express Error-Related Configuration Registers
48Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Error Handling
Mechanisms
• Correctable errors — handled by hardware
• Uncorrectable errors-nonfatal — handled by device-specific
software
• Uncorrectable errors-fatal — handled by system software
49Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
PHYSICAL LAYER LOGIC
50Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
PHYSICAL LAYER LOGIC
 Byte Striping and Un-Striping logic, Scrambler and De-
Scrambler, 8b/10b Encoder and Decoder, Elastic Buffers and
more.
The transmit logic of the Physical Layer essentially processes
packets arriving from the Data Link Layer, then converts them
into a serial bit stream.
51Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Physical Layer Overview
Physical Layer 52Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Physical Layer Overview
Logical and Electrical Sub-Blocks of the Physical Layer 53Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Transmit Logic Overview
Physical Layer Details 54Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Transmit Logic Overview
 Figure shows the elements that make up the transmit logic:
• a multiplexer (mux),
• byte striping logic (only necessary if the link implements more than one data
lane),
• scramblers,
• 8b/10b encoders,
• and parallel-to-serial converters.
 TLPs and DLLPs from the Data Link layer are clocked into a Tx
(transmit) Buffer.
 With the aid of a multiplexer, the Physical Layer frames the TLPs
or DLLPs with Start and End characters. 55Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Transmit Logic Overview
 These characters are framing symbols which the receiver device
uses to detect start and end of packet.
The framed packet is sent to the Byte Striping logic which
multiplexes the bytes of the packet onto the Lanes.
One byte of the packet is transferred on one Lane, the next byte
on the next Lane and so on for the available Lanes.
The Scrambler uses an algorithm to pseudo-randomly scramble
each byte of the packet.
The Start and End framing bytes are not scrambled. 56Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Transmit Logic Overview
 Scrambling eliminates repetitive patterns in the bit stream.
Repetitive patterns result in large amounts of energy concentrated in
discrete frequencies which leads to significant EMI noise generation.
Scrambling spreads energy over a frequency range, hence
minimizing average EMI noise generated.
The scrambled 8-bit characters (8b characters) are encoded into 10-
bit symbols (10b symbols) by the 8b/10b Encoder logic.
And yes, there is a 25% loss in transmission performance due to the
expansion of each byte into a 10-bit character. 57Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Transmit Logic Overview
A Character is defined as the 8-bit un-encoded byte of a packet.
A Symbol is defined as the 10-bit encoded equivalent of the 8-bit
character.
The 10b symbols are converted to a serial bit stream by the Parallel-
to-Serial converter.
This logic uses a 2.5 GHz clock to serially clock the packets out on
each Lane.
The serial bit stream is sent to the electrical sub-block which
differentially transmits the packet onto each Lane of the Link. 58Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Receive Logic Overview
 Figure shows the elements that make up the receiver logic:
• receive PLL,
• serial-to-parallel converter,
• elastic buffer,
• 8b/10b decoder,
• de-scrambler,
• byte un-striping logic (only necessary if the link implements more than one
data lane),
• control character removal circuit,
• and a packet receive buffer.
As the data bit stream is received, the receiver PLL is synchronized
to the clock frequency with which the packet was clocked out of the
remote transmitter device. 59Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Receive Logic Overview
 The transitions in the incoming serial bit stream are used to re-
synchronize the PPL circuitry and maintain bit and symbol lock while
generating a clock recovered from the data bit stream.
 The serial-to-parallel converter is clocked by the recovered clock
and outputs 10b symbols.
 The 10b symbols are clocked into the Elastic Buffer using the
recovered clock associated with the receiver PLL.
 The 10b symbols are converted back to 8b characters by the 8b/10b
Decoder.
 The Start and End characters that frame a packet are eliminated.60Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Receive Logic Overview
 The 8b/10b Decoder also looks for errors in the incoming 10b symbols.
For example, error detection logic can check for invalid 10b symbols
or detect a missing Start or End character.
The De-Scrambler reproduces the de-scrambled packet stream from
the incoming scrambled packet stream.
The De-Scrambler implements the inverse of the algorithm
implemented in the transmitter Scrambler.
The bytes from each Lane are un-striped to form a serial byte stream
that is loaded into the receive buffer to feed to the Data Link layer.61Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
ELECTRICAL PHYSICAL LAYER
62Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Electrical Physical Layer
Overview
 This sub-block contains differential drivers (transmitters) and
differential receivers (receivers).
 The transmitter serializes outbound symbols on each Lane and
converts the bit stream to electrical signals that have an embedded
clock.
 The receiver detects electrical signaling on each Lane and generates
a serial bit stream that it de-serializes into symbols, and supplies the
symbol stream to the logical Physical Layer along with the clock
recovered from the inbound serial bit stream.
 The drivers and receivers are short-circuit tolerant, making them
ideally suited for hot insertion and removal events. 63Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Electrical Physical Layer
Overview
Electrical Sub-Block of the Physical Layer
64Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
ESD and Short Circuit
Requirements
 All signals and power pins must withstand (without damage) a
2000V Electro-Static Discharge (ESD) using the human body
model and 500V using the charged device model.
The ESD requirement not only protects against electro-static
damage, but facilitates support of surprise hot insertion and
removal events.
Transmitters and receivers are also required to be short-circuit
tolerant.
65Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Khatam!!!
Next session is
how to use all
this stuff we
were trying to
learn_____
66Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
Slide 1Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Getting to PCI Express
But first some Background
Slide 2Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Interfacing Processor and Peripherals
 Overview
Main
memory
I/O
controller
I/O
controller
I/O
controller
Disk Graphics
output
Network
MemoryĞI/O bus
Processor
Cache
Interrupts
Disk
Slide 3Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Introduction
 I/O often viewed as second class to
processor design
– Processor research is cleaner
– System performance given in terms of processor
– Courses often ignore peripherals
– Writing device drivers is not fun
 This is crazy - a computer with no I/O is
pointless
Slide 4Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Peripheral design
 As with processors, characteristics of I/O
driven by technology advances
– E.g. properties of disk drives affect how they
should be connected to the processor
– PCs and super computers now share the same
architectures, so I/O can make all the difference
 Different requirements from processors
– Performance
– Expandability
– Resilience
Slide 5Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Peripheral performance
 Harder to measure than for the processor
– Device characteristics
• Latency / Throughput
– Connection between system and device
– Memory hierarchy
– Operating System
 Assume 100 secs to execute a benchmark
– 90 secs CPU and 10 secs I/O
– If processors get 50% faster per year for the next
5 years, what is the impact?
Slide 6Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Relative performance
 CPU time + IO time = total time (% of IO time)
 Year 0: 90 + 10 = 100 (10%)
 Year 1: 60 + 10 = 70 (14%)
 :
 Year 5: 12 + 10 = 22 (45%)
 !
Slide 7Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
IO bandwidth
 Measured in 2 ways depending on application
– How much data can we move through the system
in a given time
• Important for supercomputers with large amounts of
data for, say, weather prediction
– How many IO operations can we do in a given time
• ATM is small amount of data but need to be handled
rapidly
 So comparison is hard. Generally
– Response time lowered by handling early
– Throughput increased by handling multiple
requests together
Slide 8Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
I/O Performance Measures
 I/O bandwidth (throughput) – amount of
information that can be input (output) and
communicated across an interconnect (e.g., a
bus) to the processor/memory (I/O device) per
unit time
1. How much data can we move through the system in a
certain time?
2. How many I/O operations can we do per unit time?
 I/O response time (latency) – the total elapsed
time to accomplish an input or output operation
– An especially important metric in real-time systems
 Many applications require both high throughput
and short response times
Slide 9Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
I/O System Performance
 Designing an I/O system to meet a set of bandwidth
and/or latency constraints means
Finding the weakest link in the I/O system – the
component that constrains the design
– The processor and memory system
– The underlying interconnection (e.g., bus)
– The I/O controllers
– The I/O devices themselves
(Re)configuring the weakest link to meet the
bandwidth and/or latency requirements
Determining requirements for the rest of the
components and (re)configuring them to support
this latency and/or bandwidth
Slide 10Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
I/O System Performance Example
 A disk workload consisting of 64KB reads and writes where
the user program executes 200,000 instructions per disk I/O
operation and
– a processor that sustains 3 billion instr/s and averages
100,000 OS instructions to handle an I/O operation
– a memory-I/O bus that sustains a transfer rate of
1000 MB/s
The maximum I/O rate of the processor is
-------------------------- = ------------------------ = 10,000 I/Os/sec
Instr execution rate 3 x 109
Instr per I/O (200 + 100) x 103
Each I/O reads/writes 64 KB so the maximum I/O rate of the bus
is
---------------------- = ----------------- = 15,625 I/O’s/sec
Bus bandwidth 1000 x 106
Bytes per I/O 64 x 103
Slide 11Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Input and Output Devices
 I/O devices are incredibly diverse with respect to
– Behavior – input, output or storage
– Partner – human or machine
– Data rate – the peak rate at which data can be transferred
between the I/O device and the main memory or processor
Device Behavior Partner Data rate
(Mb/sec)
Keyboard input human 0.0001
Mouse input human 0.0038
Laser printer output human 3.2000
Graphics display output human 800.0000-
8000.0000
Network input or
output
machine 100.0000-
1000.0000
Magnetic disk storage machine 240.0000-
2560.0000
8ordersofmagnitude
range
Slide 12Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Mouse
 Communicates with
– Pulses from LED
– Increment / decrement counters
 Mice have at least 1 button
– Need click and hold
 Movement is smooth, slower than processor
– Polling
– No submarining
– Software configuration
Initial
position
of mouse
+20 in XĞ20 in X
+20 in Y
+20 in Y
+20 in X
+20 in Y
Ğ20 in X
Ğ20 in Y
Ğ20 in Y
+20 in X
Ğ20 in Y
Ğ20 in X
Slide 13Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Mouse guts
QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.
QuickTime™ and a TIFF(Uncompressed) decompressor are needed tosee this picture.
Slide 14Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Hard disk
 Rotating rigid platters with magnetic
surfaces
 Data read/written via head on armature
– Think record player
 Storage is non-volatile
 Surface divided into tracks
– Several thousand concentric circles
 Track divided in sectors
– 128 or so sectors per track
Slide 15Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Diagram
Platter
Track
Platters
Sectors
Tracks
Slide 16Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Access time
 Three parts
1. Perform a seek to position arm over correct
track
2. Wait until desired sector passes under head.
Called rotational latency or delay
3. Transfer time to read information off disk
– Usually a sector at a time at 2~4 Mb / sec
– Control is handled by a disk controller,
which can add its own delays.
Slide 17Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Calculating time
 Seek time:
– Measure max and divide by two
– More formally: (sum of all possible seeks)/number
of possible seeks
 Latency time:
– Average of complete spin
– 0.5 rotations / spin speed (3600~5400 rpm)
– 0.5/ 3600 / 60
– 0.00083 secs
– 8.3 ms
Slide 18Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Comparison
 Currently, 7.25 Gb (7,424,000) per inch
squared
QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.
Slide 19Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
More faking
 Disk drive hides
internal
optimisations
from external
world QuickTime™ anda TIFF (Uncompressed) decompressor are needed to see this picture.
Slide 20Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Disk Latency & Bandwidth Milestones
Disk latency is one average seek time plus the rotational
latency. Disk bandwidth is the peak transfer time of formatted
data from the media (not from the cache).
In the time that the disk bandwidth doubles the latency
improves by a factor of only 1.2 to 1.4
CDC Wren SG ST41 SG ST15 SG ST39 SG ST37
RSpeed (RPM) 3600 5400 7200 10000 15000
Year 1983 1990 1994 1998 2003
Capacity
(Gbytes)
0.03 1.4 4.3 9.1 73.4
Diameter
(inches)
5.25 5.25 3.5 3.0 2.5
Interface ST-412 SCSI SCSI SCSI SCSI
Bandwidth
(MB/s)
0.6 4 9 24 86
Latency (msec) 48.3 17.1 12.7 8.8 5.7
Slide 21Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Media Bandwidth/Latency Demands
 Bandwidth requirements
– High quality video
• Digital data = (30 frames/s) × (640 x 480 pixels) × (24-b
color/pixel) = 221 Mb/s (27.625 MB/s)
– High quality audio
• Digital data = (44,100 audio samples/s) × (16-b audio
samples) × (2 audio channels for stereo) = 1.4 Mb/s (0.175
MB/s)
– Compression reduces the bandwidth requirements
considerably
 Latency issues
– How sensitive is your eye (ear) to variations in video
(audio) rates?
– How can you ensure a constant rate of delivery?
– How important is synchronizing the audio and video
streams?
• 15 to 20 ms early to 30 to 40 ms late tolerable
Slide 22Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Magnetic Disk Examples (www.seagate.com)
Characteristic Seagate ST37 Seagate ST32 Seagate ST94
Disk diameter (inches) 3.5 3.5 2.5
Capacity (GB) 73.4 200 40
# of surfaces (heads) 8 4 2
Rotation speed (RPM) 15,000 7,200 5,400
Transfer rate (MB/sec) 57-86 32-58 34
Minimum seek (ms) 0.2r-0.4w 1.0r-1.2w 1.5r-2.0w
Average seek (ms) 3.6r-3.9w 8.5r-9.5w 12r-14w
MTTF (hours@25o
C) 1,200,000 600,000 330,000
Dimensions (inches) 1”x4”x5.8” 1”x4”x5.8” 0.4”x2.7”x3.9”
GB/cu.inch 3 9 10
Power: op/idle/sb (watts) 20?/12/- 12/8/1 2.4/1/0.4
GB/watt 4 16 17
Weight (pounds) 1.9 1.4 0.2
Slide 23Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Buses: Connecting I/O devices
 Interfacing subsystems in a computer system
is commonly done with a bus: “a shared
communication link, which uses one set of
wires to connect multiple sub-systems”
Slide 24Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Why a bus?
 Main benefits:
– Versatility: new devices easily added
– Low cost: reusing a single set of wires many ways
 Problems:
– Creates a bottleneck
– Tries to be all things to all subsystems
 Comprised of
– Control lines: signal requests, acknowledgements
and to show what type of information is on the
– Data lines:data, destination / source address
Slide 25Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Controlling a bus
 As the bus is shared, need a protocol to
manage usage
 Bus transaction consists of
– Sending the address
– Sending / receiving the data
 Note than in buses, we talk about what the
bus does to memory
– During a read, a bus will ‘receive’ data
Slide 26Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Bus transaction 1 - disk write
Memory Processor
Control lines
Data lines
Disks
Memory Processor
Control lines
Data lines
Disks
Processor
Control lines
Data lines
Disks
a.
b.
c.
Memory
Slide 27Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Bus transaction 2 - disk read
Memory Processor
Control lines
Data lines
Disks
Processor
Control lines
Data lines
Disks
a.
b.
Memory
Slide 28Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Types of Bus
 Processor-memory bus
– Short and high speed
– Matched to memory system (usually Proprietary)
 I/O buses
– Lengthy,
– Connected to a wide range of devices
– Usually connected to the processor using 1 or 3
 Backplane bus
– Processors, memory and devices on single bus
– Has to balance proc-memory with I/O-memory
– Usually requires extra logic to do this
Slide 29Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Bus type diagram
Processor Memory
Backplane bus
a. I/O devices
Processor Memory
Processor-memory bus
b.
Bus
adapter
Bus
adapter
I/O
bus
I/O
bus
Bus
adapter
I/O
bus
Processor Memory
Processor-memory bus
c.
Bus
adapter
Backplane
bus
Bus
adapter
I/O bus
Bus
adapter
I/O bus
Slide 30Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Synchronous and Asynchronous buses
 Synchronous bus has a clock attached to the
control lines and a fixed protocol for
communicating that is relative to the pulse
 Advantages
– Easy to implement (CC1 read, CC5 return value)
– Requires little logic (FSM to specify)
 Disadvantages
– All devices must run at same rate
– If fast, cannot be long due to clock skew
 Most proc-mem buses are clocked
Slide 31Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Asynchronous buses
 No clock, so it can accommodate a variety of
devices (no clock = no skew)
 Needs a handshaking protocol to coordinate
different devices
– Agreed steps to progress through by sender and
receiver
– Harder to implement - needs more control lines
Slide 32Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Example handshake - device wants a
word from memory
DataRdy
Ack
Data
ReadReq 1
3
4
5
7
6
42 2
Slide 33Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
FSM control
1
Record from
data lines
and assert
Ack
ReadReq
ReadReq
________
ReadReq
ReadReq
3, 4
Drop Ack;
put memory
data on data
lines; assert
DataRdy
Ack
Ack
6
Release data
lines and
DataRdy
________
___
Memory
2
Release data
lines; deassert
ReadReq
Ack
DataRdy
DataRdy
5
Read memory
data from data
lines;
assert Ack
DataRdy
DataRdy
7
Deassert Ack
I/O device
Put address
on data
lines; assert
ReadReq
________
Ack
___
________
New I/O request
New I/O request
Slide 34Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Increasing bus bandwidth
 Key factors
– Data bus width: Wider = fewer cycles for transfer
– Separate vs Multiplexed, data and address lines
• Separating allows transfer in one bus cycle
– Block transfer: Transfer multiple blocks of data in
consecutive cycles without resending addresses
and control signals etc.
Slide 35Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Obtaining bus access
 Need one, or more, bus masters to prevent
chaos
 Processor is always a bus master as it needs
to access memory
– Memory is always a slave
 Simplest system as a single master (CPU)
 Problems
– Every transfer needs CPU time
– As peripherals become smarter, this is a waste of
time
 But, multiple masters can cause problems
Slide 36Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Bus Arbitration
 Deciding which master gets to go next
– Master issues ‘bus request’ and awaits ‘granted’
 Two key properties
– Bus priority (highest first)
– Bus fairness (even the lowest get a go, eventually)
 Arbitration is an overhead, so good to reduce
it
– Dedicated lines, grant lines, release lines etc.
Slide 37Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Different arbitration schemes
 Daisy chain: Bus grant line runs through
devices from highest to lowest
 Very simple, but cannot guarantee fairness
Device n
Lowest priority
Device 2Device 1
Highest priority
Bus
arbiter
Grant
Grant Grant
Release
Request
Slide 38Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Centralised Arbitration
 Centralised, parallel: All devices have
separate connections to the bus arbiter
– This is how the PCI backplane bus works (found in
most PCs)
– Can guarantee fairness
– Arbiter can become congested
Slide 39Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Distributed
 Distributed arbitration by self selection:
 Each device contains information about
relative importance
 A device places its ID on the bus when it
wants access
 If there is a conflict, the lower priority
devices back down
 Requires separate lines and complex devices
 Used on the Macintosh II series (NuBus)
Slide 40Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Collision detection
 Distributed arbitration by collision detection:
 Basically ethernet
 Everyone tries to grab the bus at once
 If there is a ‘collision’ everyone backs off a
random amount of time
Slide 41Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Bus standards
 To ensure machine expansion and peripheral
re-use, there are various standard buses
– IBM PC-AT bus (de-facto standard)
– SCSI (needs controller)
– PCI (Started as Intel, now IEEE)
– Ethernet
 Bus bandwidth depends on size of transfer
and memory speed
Slide 42Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
PCI
Type Backplane
Data width 32-64
Address/data Multiplexed
Bus masters Multiple
Arbitration Central parallel
Clocking Synch. 33-66 Mhz
Theoretical Peak 133-512 MB/sec
Achievable peak 80 MB/sec
Max devices 1024
Max length 50 cm
Bananas none
Slide 43Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
My Old Macintosh
Main
memory
I/O
controller
I/O
controller
Graphics
output
PCI
CDROM
Disk
Tape
I/O
controller
Stereo
I/O
controller
Serial
ports
I/O
controller
Apple
desktop bus
Processor
PCI
interface/
memory
controller
EthernetSCSI
bus
outputinput
Slide 44Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Example: The Pentium 4’s Buses
System Bus (“Front Side Bus”):
64b x 800 MHz (6.4GB/s), 533
MHz, 400 MHz
2 serial ATAs:
150 MB/s
8 USBs: 60 MB/s
2 parallel ATA:
100 MB/s
Hub Bus: 8b x 266 MHz
Memory Controller Hub
(“Northbridge”)
I/O Controller Hub
(“Southbridge”)
Gbit ethernet: 0.266 GB/s
DDR SDRAM
Main
Memory
Graphics output:
2.0 GB/s
PCI: 32b x
33 MHz
Slide 45Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Buses in Transition
Companies are transitioning from synchronous,
parallel, wide buses to asynchronous narrow buses
– Reflection on wires and clock skew makes it difficult to
(synchronously) use 16 to 64 parallel wires running at a
high clock rate (e.g., ~400 MHz) so companies are
transitioning to buses with a few one-way, asynchronous
wires running at a very high clock rate (~2 GHz)
PCI PCIexpres
s
ATA Serial ATA
Total # wires 120 36 80 7
# data wires 32 – 64
(2-way)
2 x 4
(1-way)
16
(2-way)
2 x 2
(1-way)
Clock (MHz) 33 – 133 635 50 150
Peak BW (MB/s) 128 – 1064 300 100 150
Slide 46Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
ATA Cable Sizes
 Serial ATA cables (red) are much thinner
than parallel ATA cables (green)
Slide 47Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Giving commands to I/O devices
 Processor must be able to address a device
– Memory mapping: portions of memory are
allocated to a device (Base address on a PC)
• Different addresses in the space mean different things
• Could be a read, write or device status address
– Special instructions: Machine code for specific
devices
• Not a good idea generally
Slide 48Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Communicating with the Processor
 Polling
– Process of periodically checking the status bits to
see if it is time for the next I/O operation
– Simplest way for device to communicate (via a
shared status register
– Mouse
– Wasteful of processor time
Slide 49Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Interrupts
 Notify processor when a device needs
attention (IRQ lines on a PC)
 Just like exceptions, except for
– Interrupt is asynchronous with program execution
• Control unit only checks I/O interrupt at the start of each
instruction execution
– Need further information, such as the identity of
the device that caused the interrupt and its
priority
• Remember the Cause Register?
Slide 50Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Interrupt-Driven I/O
 With I/O interrupts
– Need a way to identify the device generating the interrupt
– Can have different urgencies (so may need to be prioritized)
 Advantages of using interrupts
– Relieves the processor from having to continuously poll for
an I/O event; user program progress is only suspended
during the actual transfer of I/O data to/from user memory
space
 Disadvantage – special hardware is needed to
– Cause an interrupt (I/O device) and detect an interrupt and
save the necessary information to resume normal
processing after servicing the interrupt (processor)
Slide 52Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Interrupt-Driven Input
memory
user
program
1. input
interrupt
2.1 save state
Processor
ReceiverMemory
add
sub
and
or
beq
lbu
sb
...
jr
2.2 jump to
interrupt
service routine
2.4 return
to user code
Keyboard
2.3 service
interrupt
input
interrupt
service
routine
Slide 53Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Interrupt-Driven Output
Processor
TrnsmttrMemory
Display
add
sub
and
or
beq
lbu
sb
...
jr
memory
user
program
1.output
interrupt
2.1 save state
output
interrupt
service
routine
2.2 jump to
interrupt
service routine
2.4 return
to user code
2.3 service
interrupt
Slide 54Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Transferring Data Between Device and
Memory
 We can do this with Interrupts and Polling
– Works best with low bandwidth devices and
keeping cost of controller and interface
– Burden lies with the processor
 For high bandwidth devices, we don’t want
the processor worrying about every single
block
 Need a scheme for high bandwidth
autonomous transfers
Slide 55Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Direct Memory Access (DMA)
 Mechanism for offloading the processor and
having the device controller transfer data
directly
 Still uses interrupt mechanism, but only to
communicate completion of transfer or error
 Requires dedicated controller to conduct the
transfer
Slide 56Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Doing DMA
 Essentially, DMA controller becomes bus
master and sets up the transfer
 Three steps
– Processor sets up the DMA by supplying
• device identity
• Operation on device
• Memory Address (source or destination)
• Amount to transfer
– DMA operates devices, supplies addresses and
arbitrates bus
– On completion, controller notifies processor
Slide 57Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
DMA and the memory system
 With DMA, the relationship between memory
and processor is changed
– DMA bypasses address translation and hierarchy
 So, should DMA use virtual or physical
addresses?
– Virtual addresses: DMA must translate
– Physical addresses: Hard to cross page boundary
Slide 58Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
DMA address translation
 Can provide the DMA with a small address
translation table for pages - provided by OS
at transfer time
 Get the OS to break the transfer into chunks,
each chunk relating to a single page
 Regardless, the OS cannot relocate pages
during transfer
Slide 59Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
Evolution vs Revolution
 Evolutionary approaches tend to be invisible
to users except for
– Lower cost and better performance
 Revolutionary approaches require new
languages and applications
– Looks good on paper
– Must be worth the effort
– KCM
Slide 60Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd
BUSSSS!!!!!
Now let’s dive into the actual
scenario

Contenu connexe

Tendances

Verification Strategy for PCI-Express
Verification Strategy for PCI-ExpressVerification Strategy for PCI-Express
Verification Strategy for PCI-ExpressDVClub
 
Creating Your Own PCI Express System Using FPGAs: Embedded World 2010
Creating Your Own PCI Express System Using FPGAs: Embedded World 2010Creating Your Own PCI Express System Using FPGAs: Embedded World 2010
Creating Your Own PCI Express System Using FPGAs: Embedded World 2010Altera Corporation
 
PCI Express Verification using Reference Modeling
PCI Express Verification using Reference ModelingPCI Express Verification using Reference Modeling
PCI Express Verification using Reference ModelingDVClub
 
APB protocol v1.0
APB protocol v1.0APB protocol v1.0
APB protocol v1.0Azad Mishra
 
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_FinalGopi Krishnamurthy
 
AMBA 3 APB Protocol
AMBA 3 APB ProtocolAMBA 3 APB Protocol
AMBA 3 APB ProtocolSwetha GSM
 
44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick
44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick
44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick44CON
 
Difference between PCI PCI-X PCIe
Difference between PCI PCI-X PCIeDifference between PCI PCI-X PCIe
Difference between PCI PCI-X PCIeSUNODH GARLAPATI
 

Tendances (20)

Verification Strategy for PCI-Express
Verification Strategy for PCI-ExpressVerification Strategy for PCI-Express
Verification Strategy for PCI-Express
 
Creating Your Own PCI Express System Using FPGAs: Embedded World 2010
Creating Your Own PCI Express System Using FPGAs: Embedded World 2010Creating Your Own PCI Express System Using FPGAs: Embedded World 2010
Creating Your Own PCI Express System Using FPGAs: Embedded World 2010
 
Ambha axi
Ambha axiAmbha axi
Ambha axi
 
Pci express technology 3.0
Pci express technology 3.0Pci express technology 3.0
Pci express technology 3.0
 
axi protocol
axi protocolaxi protocol
axi protocol
 
Apb
ApbApb
Apb
 
Advance Peripheral Bus
Advance Peripheral Bus Advance Peripheral Bus
Advance Peripheral Bus
 
PCI Express Verification using Reference Modeling
PCI Express Verification using Reference ModelingPCI Express Verification using Reference Modeling
PCI Express Verification using Reference Modeling
 
APB protocol v1.0
APB protocol v1.0APB protocol v1.0
APB protocol v1.0
 
Axi
AxiAxi
Axi
 
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
 
Pci express modi
Pci express modiPci express modi
Pci express modi
 
AMBA 3 APB Protocol
AMBA 3 APB ProtocolAMBA 3 APB Protocol
AMBA 3 APB Protocol
 
AMBA_APB_pst
AMBA_APB_pstAMBA_APB_pst
AMBA_APB_pst
 
Axi
AxiAxi
Axi
 
dual-port RAM (DPRAM)
dual-port RAM (DPRAM)dual-port RAM (DPRAM)
dual-port RAM (DPRAM)
 
Axi protocol
Axi protocolAxi protocol
Axi protocol
 
44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick
44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick
44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick
 
Difference between PCI PCI-X PCIe
Difference between PCI PCI-X PCIeDifference between PCI PCI-X PCIe
Difference between PCI PCI-X PCIe
 
Axi protocol
Axi protocolAxi protocol
Axi protocol
 

En vedette

Pci express
Pci expressPci express
Pci expresspciex16
 
Moving to PCI Express based SSD with NVM Express
Moving to PCI Express based SSD with NVM ExpressMoving to PCI Express based SSD with NVM Express
Moving to PCI Express based SSD with NVM ExpressOdinot Stanislas
 
PCI Express* based Storage: Data Center NVM Express* Platform Topologies
PCI Express* based Storage: Data Center NVM Express* Platform TopologiesPCI Express* based Storage: Data Center NVM Express* Platform Topologies
PCI Express* based Storage: Data Center NVM Express* Platform TopologiesOdinot Stanislas
 
Formation pcie excel partie 1
Formation pcie excel partie 1Formation pcie excel partie 1
Formation pcie excel partie 1OneIDlille
 
Identifying PCIe 3.0 Dynamic Equalization Problems
Identifying PCIe 3.0 Dynamic Equalization ProblemsIdentifying PCIe 3.0 Dynamic Equalization Problems
Identifying PCIe 3.0 Dynamic Equalization Problemsteledynelecroy
 
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...Odinot Stanislas
 
NVMe PCIe and TLC V-NAND It’s about Time
NVMe PCIe and TLC V-NAND It’s about TimeNVMe PCIe and TLC V-NAND It’s about Time
NVMe PCIe and TLC V-NAND It’s about TimeDell World
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
 
Formation pcie ppt word partie 1
Formation pcie ppt word partie 1Formation pcie ppt word partie 1
Formation pcie ppt word partie 1OneIDlille
 
N2 Business Development Group
N2 Business Development GroupN2 Business Development Group
N2 Business Development GroupAnan Natarajan
 
Universal plug & play
Universal plug & playUniversal plug & play
Universal plug & playRaghu Juluri
 
Computer conversions vme_specsheet
Computer conversions vme_specsheetComputer conversions vme_specsheet
Computer conversions vme_specsheetElectromate
 
Fcamp may2010-tech2-fpga high speed io trends-alteraTrends & Challenges in De...
Fcamp may2010-tech2-fpga high speed io trends-alteraTrends & Challenges in De...Fcamp may2010-tech2-fpga high speed io trends-alteraTrends & Challenges in De...
Fcamp may2010-tech2-fpga high speed io trends-alteraTrends & Challenges in De...FPGA Central
 
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...Odinot Stanislas
 
Storage Class Memory: Technology Overview & System Impacts
Storage Class Memory: Technology Overview & System ImpactsStorage Class Memory: Technology Overview & System Impacts
Storage Class Memory: Technology Overview & System ImpactsZhichao Liang
 

En vedette (18)

Pci express
Pci expressPci express
Pci express
 
Moving to PCI Express based SSD with NVM Express
Moving to PCI Express based SSD with NVM ExpressMoving to PCI Express based SSD with NVM Express
Moving to PCI Express based SSD with NVM Express
 
PCI Express* based Storage: Data Center NVM Express* Platform Topologies
PCI Express* based Storage: Data Center NVM Express* Platform TopologiesPCI Express* based Storage: Data Center NVM Express* Platform Topologies
PCI Express* based Storage: Data Center NVM Express* Platform Topologies
 
Formation pcie excel partie 1
Formation pcie excel partie 1Formation pcie excel partie 1
Formation pcie excel partie 1
 
Identifying PCIe 3.0 Dynamic Equalization Problems
Identifying PCIe 3.0 Dynamic Equalization ProblemsIdentifying PCIe 3.0 Dynamic Equalization Problems
Identifying PCIe 3.0 Dynamic Equalization Problems
 
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
 
NVMe PCIe and TLC V-NAND It’s about Time
NVMe PCIe and TLC V-NAND It’s about TimeNVMe PCIe and TLC V-NAND It’s about Time
NVMe PCIe and TLC V-NAND It’s about Time
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
Formation pcie ppt word partie 1
Formation pcie ppt word partie 1Formation pcie ppt word partie 1
Formation pcie ppt word partie 1
 
N2 Business Development Group
N2 Business Development GroupN2 Business Development Group
N2 Business Development Group
 
89HT0832P 16-lane 8 Gbps PCIe 3.0 Retimer by IDT
89HT0832P 16-lane 8 Gbps PCIe 3.0 Retimer by IDT89HT0832P 16-lane 8 Gbps PCIe 3.0 Retimer by IDT
89HT0832P 16-lane 8 Gbps PCIe 3.0 Retimer by IDT
 
Universal plug & play
Universal plug & playUniversal plug & play
Universal plug & play
 
Computer conversions vme_specsheet
Computer conversions vme_specsheetComputer conversions vme_specsheet
Computer conversions vme_specsheet
 
HP_NextGEN_Training_Q4_2015
HP_NextGEN_Training_Q4_2015HP_NextGEN_Training_Q4_2015
HP_NextGEN_Training_Q4_2015
 
Fcamp may2010-tech2-fpga high speed io trends-alteraTrends & Challenges in De...
Fcamp may2010-tech2-fpga high speed io trends-alteraTrends & Challenges in De...Fcamp may2010-tech2-fpga high speed io trends-alteraTrends & Challenges in De...
Fcamp may2010-tech2-fpga high speed io trends-alteraTrends & Challenges in De...
 
Storage class memory
Storage class memoryStorage class memory
Storage class memory
 
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
 
Storage Class Memory: Technology Overview & System Impacts
Storage Class Memory: Technology Overview & System ImpactsStorage Class Memory: Technology Overview & System Impacts
Storage Class Memory: Technology Overview & System Impacts
 

Similaire à Session 8,9 PCI Express

PCIe BUS: A State-of-the-Art-Review
PCIe BUS: A State-of-the-Art-ReviewPCIe BUS: A State-of-the-Art-Review
PCIe BUS: A State-of-the-Art-ReviewIOSRJVSP
 
PCI_Express_Basics_Background.pdf
PCI_Express_Basics_Background.pdfPCI_Express_Basics_Background.pdf
PCI_Express_Basics_Background.pdfzahixdd
 
Io Architecture
Io ArchitectureIo Architecture
Io ArchitectureAero Plane
 
Design and Analysis of Xilinx Verified AMBA Bridge for SoC Systems
Design and Analysis of Xilinx Verified AMBA Bridge for SoC SystemsDesign and Analysis of Xilinx Verified AMBA Bridge for SoC Systems
Design and Analysis of Xilinx Verified AMBA Bridge for SoC Systemsidescitation
 
PCIX Gigabit Ethernet Card Design
PCIX Gigabit Ethernet Card DesignPCIX Gigabit Ethernet Card Design
PCIX Gigabit Ethernet Card DesignMohamad Tisani
 
Implementation of PCI Target Controller Interfacing with Asynchronous SRAM
Implementation of PCI Target Controller Interfacing with Asynchronous SRAMImplementation of PCI Target Controller Interfacing with Asynchronous SRAM
Implementation of PCI Target Controller Interfacing with Asynchronous SRAMIOSR Journals
 
A REVIEW ON ANALYSIS OF 32-BIT AND 64-BIT RISC PROCESSORS
A REVIEW ON ANALYSIS OF 32-BIT AND 64-BIT RISC PROCESSORSA REVIEW ON ANALYSIS OF 32-BIT AND 64-BIT RISC PROCESSORS
A REVIEW ON ANALYSIS OF 32-BIT AND 64-BIT RISC PROCESSORSIRJET Journal
 
Design And Verification of AMBA APB Protocol
Design And Verification of AMBA APB ProtocolDesign And Verification of AMBA APB Protocol
Design And Verification of AMBA APB ProtocolIJERA Editor
 
Asic implementation of i2 c master bus
Asic implementation of i2 c master busAsic implementation of i2 c master bus
Asic implementation of i2 c master busVLSICS Design
 
IRJET- Design of Virtual Channel Less Five Port Network
IRJET- Design of Virtual Channel Less Five Port NetworkIRJET- Design of Virtual Channel Less Five Port Network
IRJET- Design of Virtual Channel Less Five Port NetworkIRJET Journal
 
Ppt on six month training on embedded system & IOT
Ppt on six month training on embedded system & IOTPpt on six month training on embedded system & IOT
Ppt on six month training on embedded system & IOTpreetigill309
 
Review on Transmission and Reception of Data through USB in VHDL
Review on Transmission and Reception of Data through USB in VHDLReview on Transmission and Reception of Data through USB in VHDL
Review on Transmission and Reception of Data through USB in VHDLIRJET Journal
 
Design of 32 bit Parallel Prefix Adders
Design of 32 bit Parallel Prefix Adders Design of 32 bit Parallel Prefix Adders
Design of 32 bit Parallel Prefix Adders IOSR Journals
 
Design of 32 bit Parallel Prefix Adders
Design of 32 bit Parallel Prefix AddersDesign of 32 bit Parallel Prefix Adders
Design of 32 bit Parallel Prefix AddersIOSR Journals
 
Review paper on 32-BIT RISC processor with floating point arithmetic
Review paper on 32-BIT RISC processor with floating point arithmeticReview paper on 32-BIT RISC processor with floating point arithmetic
Review paper on 32-BIT RISC processor with floating point arithmeticIRJET Journal
 
Ip interfaces by faststream technologies
Ip interfaces by faststream technologiesIp interfaces by faststream technologies
Ip interfaces by faststream technologiesVishalMalhotra58
 
PCI Bridge Performance
PCI Bridge PerformancePCI Bridge Performance
PCI Bridge PerformanceMohamad Tisani
 

Similaire à Session 8,9 PCI Express (20)

PCIe BUS: A State-of-the-Art-Review
PCIe BUS: A State-of-the-Art-ReviewPCIe BUS: A State-of-the-Art-Review
PCIe BUS: A State-of-the-Art-Review
 
PCI_Express_Basics_Background.pdf
PCI_Express_Basics_Background.pdfPCI_Express_Basics_Background.pdf
PCI_Express_Basics_Background.pdf
 
Io Architecture
Io ArchitectureIo Architecture
Io Architecture
 
Design and Analysis of Xilinx Verified AMBA Bridge for SoC Systems
Design and Analysis of Xilinx Verified AMBA Bridge for SoC SystemsDesign and Analysis of Xilinx Verified AMBA Bridge for SoC Systems
Design and Analysis of Xilinx Verified AMBA Bridge for SoC Systems
 
A novel reduced instruction set computer-communication processor design usin...
A novel reduced instruction set computer-communication  processor design usin...A novel reduced instruction set computer-communication  processor design usin...
A novel reduced instruction set computer-communication processor design usin...
 
PCIX Gigabit Ethernet Card Design
PCIX Gigabit Ethernet Card DesignPCIX Gigabit Ethernet Card Design
PCIX Gigabit Ethernet Card Design
 
Implementation of PCI Target Controller Interfacing with Asynchronous SRAM
Implementation of PCI Target Controller Interfacing with Asynchronous SRAMImplementation of PCI Target Controller Interfacing with Asynchronous SRAM
Implementation of PCI Target Controller Interfacing with Asynchronous SRAM
 
A REVIEW ON ANALYSIS OF 32-BIT AND 64-BIT RISC PROCESSORS
A REVIEW ON ANALYSIS OF 32-BIT AND 64-BIT RISC PROCESSORSA REVIEW ON ANALYSIS OF 32-BIT AND 64-BIT RISC PROCESSORS
A REVIEW ON ANALYSIS OF 32-BIT AND 64-BIT RISC PROCESSORS
 
Design And Verification of AMBA APB Protocol
Design And Verification of AMBA APB ProtocolDesign And Verification of AMBA APB Protocol
Design And Verification of AMBA APB Protocol
 
Asic implementation of i2 c master bus
Asic implementation of i2 c master busAsic implementation of i2 c master bus
Asic implementation of i2 c master bus
 
IRJET- Design of Virtual Channel Less Five Port Network
IRJET- Design of Virtual Channel Less Five Port NetworkIRJET- Design of Virtual Channel Less Five Port Network
IRJET- Design of Virtual Channel Less Five Port Network
 
Power Analysis of Embedded Low Latency Network on Chip
Power Analysis of Embedded Low Latency Network on ChipPower Analysis of Embedded Low Latency Network on Chip
Power Analysis of Embedded Low Latency Network on Chip
 
Ppt on six month training on embedded system & IOT
Ppt on six month training on embedded system & IOTPpt on six month training on embedded system & IOT
Ppt on six month training on embedded system & IOT
 
Review on Transmission and Reception of Data through USB in VHDL
Review on Transmission and Reception of Data through USB in VHDLReview on Transmission and Reception of Data through USB in VHDL
Review on Transmission and Reception of Data through USB in VHDL
 
Design of 32 bit Parallel Prefix Adders
Design of 32 bit Parallel Prefix Adders Design of 32 bit Parallel Prefix Adders
Design of 32 bit Parallel Prefix Adders
 
Design of 32 bit Parallel Prefix Adders
Design of 32 bit Parallel Prefix AddersDesign of 32 bit Parallel Prefix Adders
Design of 32 bit Parallel Prefix Adders
 
Review paper on 32-BIT RISC processor with floating point arithmetic
Review paper on 32-BIT RISC processor with floating point arithmeticReview paper on 32-BIT RISC processor with floating point arithmetic
Review paper on 32-BIT RISC processor with floating point arithmetic
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
Ip interfaces by faststream technologies
Ip interfaces by faststream technologiesIp interfaces by faststream technologies
Ip interfaces by faststream technologies
 
PCI Bridge Performance
PCI Bridge PerformancePCI Bridge Performance
PCI Bridge Performance
 

Session 8,9 PCI Express

  • 1. Soft Polynomials (I) Pvt. Ltd. PCI EXPRESS SYSTEM ARCHITECTURE 1Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 2. • Architectural Perspective • Architecture Overview • Address Spaces & Transaction Routing • Packet Based Transaction • ACK / NAK Protocol • QoS / TCs / VCs and Arbitration • Flow Control • Interrupts • Error Detection and Handling • Physical Layer Logic • Electrical Physical Layer Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd. 2 Contents
  • 3. ARCHITECTURAL PERSPECTIVE 3Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 4. ARCHITECTURAL PERSPECTIVE Introduction to PCI Express  PCI Express is the third generation high performance I/O bus used to interconnect peripheral devices in applications such as computing and communication platforms.  It has the application in the mobile, desktop, workstation, server, embedded computing and communication platforms. 4Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 5. ARCHITECTURAL PERSPECTIVE The Role of the Original PCI Solution Don’t Throw Away What is Good !!! Keep It !!!  Same usage model and load-store communication model.  Support familiar transactions.  Memory read/write, IO read/write and configuration read/write transactions.  The memory, IO and configuration address space model is the same as PCI and PCI-X address spaces.  By maintaining the address space model, existing OSs and driver software will run in a PCI Express system without any modifications.  In other words, PCI Express is software backwards compatible with PCI and PCI-X systems.  It supports chip-to-chip interconnect and board-to-board interconnect. 5Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 6. ARCHITECTURAL PERSPECTIVE Make Improvements for the Future  It Implements a serial, point-to-point type interconnect for communication between two devices.  Multiple PCI Express devices are interconnected via the use of switches.  PCI Express transmission and reception data rate is 2.5 Gbits/sec.  A packet-based communication protocol.  Hot Plug/Hot Swap 6Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 7. ARCHITECTURAL PERSPECTIVE Looking into the Future  In the future, PCI Express communication frequencies are expected to double and quadruple to 5 Gbits/sec. Taking advantage of these frequencies will require Physical Layer re-design of a device with no changes necessary to the higher layers of the device design. 7Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 8. ARCHITECTURAL PERSPECTIVE Bus Performances and Number of Slots Compared Bus Type Clock Frequency Peak Bandwidth* Number of Card Slots per Bus PCI 32 – bit 33 MHz 133 Mbytes / sec 4 – 5 PCI 32 – bit 66 MHz 266 Mbytes / sec 1 – 2 PCI – X 32 – bit 66 MHz 266 Mbytes / sec 4 PCI – X 32 – bit 133 MHz 533 Mbytes / sec 1 – 2 PCI – X 32 – bit 266 MHz effective 1066 Mbytes / sec 1 PCI – X 32 – bit 533 MHz effective 2131 Mbytes / sec 1 * Double all these bandwidth numbers for 64 – bit bus implementations 8Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 9. ARCHITECTURAL PERSPECTIVE PCI Express Aggregate Throughput  A PCI Express interconnect that connects two devices together is referred to as a Link.  A Link consists of either x1, x2, x3, x4, x8, x12, x16 or x32 signal pairs in each direction.  These signals are referred to as Lanes.  To support a greater degree of robustness during data transmission and reception, each byte of data transmitted is converted into a 10 – bit code (via an 8b /10b encoder in the transmission device). The result is 25% additional overhead to transmit a byte of data. 9Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 10. PCI Express Aggregate Throughput  To obtain the aggregate bandwidth numbers in table multiply 2.5 Gbits/sec by 2 (for each direction), then multiply by number of Lanes, and finally divide by 10 – bits per Byte (to account for the 8 – to – 10 bit encoding) PCI Express Link Width x1 x2 x4 x8 x12 x16 x32 Aggregate Band – Width (Gbytes/sec) 0.5 1 2 4 6 8 16 Table : PCI Express Aggregate Throughput for Various Link Widths 10Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 11. ARCHITECTURAL PERSPECTIVE Performance Per Pin Compared 11 Comparison of Performance Per Pin for Various Buses  In Figure, the first 7 bars are associated with PCI and PCI – X buses where we assume 84 pins per device.  This includes 46 signal pins, interrupt and power management pins, error pins and the remainder are power and ground pins. The last bar associated with a x8 PCI Express Link assumes 40 pins per device which include 32 signal lines (8 differential pains per direction) and the rest are power and ground pins. Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 12. ARCHITECTURAL PERSPECTIVE I/O Bus Architecture Perspective 33 MHz PCI Bus Based System  The PCI bus clock is 33 MHz.  The address bus width is 32 bits (4GB memory address space), although PCI optionally supports 64-bit address bus.  The data bus width is implemented as either 32-bits or 64-bits depending on bus performance requirement.  The address and data bus signals are multiplexed on thye same pins (AD bus) to reduce pin count.  PCI supports 12 transaction types.  A PCI master device implements a minimum of 49 signals. 12Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 13. I/O Bus Architecture Perspective 33 MHz PCI Bus Based System 33 MHz PCI Bus Based Platform 13Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 14. ARCHITECTURAL PERSPECTIVE I/O Bus Architecture Perspective 33 MHz PCI Based System Showing Implementation of a PCI-to-PCI Bridge 14 33 MHz PCI Bus Based System Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 15. 15 Typical PCI Burst Memory Read Bus Cycle ARCHITECTURAL PERSPECTIVE Bus / Machine Cycles Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 16. PCI Transaction Model - Programmed IO PCI Transaction Model 16Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 17. ARCHITECTURAL PERSPECTIVE I/O Bus Architecture Perspective PCI Transaction Model - Programmed IO 17  CPU communicates with a PCI peripheral such as an Ethernet device.  Software commands the CPU to initiate a memory.  The North bridge arbitrates for use of the PCI bus and when it wins ownership of the bus generates a PCI memory or IO read/write bus cycle.  First clock of this bus cycle (known as the address phase), all target devices decode the address. Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 18. PCI Transaction Model - Programmed IO One target decodes the address and claims the transaction. The master communicates with the claiming target.  Data is transferred between master and target in subsequent clocks after the address phase of the bus cycle.  Either 4 bytes or 8 bytes of data are transferred per clock tick depending on the PCI bus width.  The bus cycle is referred to as a burst bus cycle if data is transferred back-to-back between master and target during multiple data phases of that bus cycle. 18Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 19. PCI EXPRESS ARCHITECTURE OVERVIEW 19Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 20. 20 Architecture Overview PCI Cards - Slots Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 21. 21 Architecture Overview Slot Pin Allotment Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 22. 22 Architecture Overview Slot Pin Allotment Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 23. 23Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 24. ARCHITECTURE OVERVIEW PCI Express Transactions  Communication involves the transmission and reception of packets called Transaction Layer packets (TLPs).  PCI Express transactions can be grouped into four categories: 1) Memory 2) IO 3) Configuration 4) Message Transactions  Transactions are defined as a series of one or more packet transmissions required to complete an information transfer between a requester and a completer. 24Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 25. PCI Express Transactions Transaction Type Non-Posted or Posted Memory Read Non-Posted Memory Write Posted Memory Read Lock Non-Posted IO Read Non-Posted IO Write Non-Posted Configuration Read (Type 0 and Type 1) Non-Posted Configuration Write (Type 0 and Type 1) Non-Posted Message Posted PCI Express Non-Posted and Posted Transactions ARCHITECTURE OVERVIEW 25Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 26. ARCHITECTURE OVERVIEW PCI Express Transactions  For Non-posted transactions, a requester transmits a TLP request packet to a completer.  At a later time, the completer returns a TLP completion packet back to the requester.  The purpose of the completion TLP is to confirm to the requester that the completer has received the request TLP. Non – posted Transactions 26Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 27. ARCHITECTURE OVERVIEW PCI Express Transactions  For Posted transactions, a requester transmits a TLP request packet to a completer.  The completer however does NOT return a completion TLP back to the requester.  Posted transactions are optimized for best performance in completing the transaction at the expense of the requester not having knowledge of successful reception of the request by the completer. Posted Transactions 27Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 28. ARCHITECTURE OVERVIEW PCI Express Transaction Protocol PCI Express TLP Packet Types 28Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 29. ARCHITECTURE OVERVIEW Non – Posted Read Transactions  Requesters may be root complex or endpoint devices (endpoints do not initiate configuration read/write requests however).  The request TLP is routed through the fabric of switches using information in the header portion of the TLP.  The packet makes its way to a targeted completer.  The completer can be a root complex, switches, bridges or endpoints. 29Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 30. ARCHITECTURE OVERVIEW Non – Posted Read Transactions  The completer can return up to 4 KBytes of data per CplD packet.  The completion packet contains routing information necessary to route the packet back to the requester.  If a completer is unable to obtain requested data as a result of an error, it returns a completion packet without data (Cpl) and an error status indication. The requester determines how to handle the error at the software layer. 30Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 31. ARCHITECTURE OVERVIEW Non – Posted Read Transactions Non-Posted Read Transaction Protocol 31Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 32. ARCHITECTURE OVERVIEW Non – Posted Read Transactions for Locked Requests  The requester can only be a root complex which initiates a locked request on the behalf of the CPU.  Endpoints are not allowed to initiate locked requests.  The completer can only be a legacy endpoint.  The entire path from root complex to the endpoint is locked including the ingress and egress port of switches in the pathway. 32Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 33. ARCHITECTURE OVERVIEW Non – Posted Read Transactions for Locked Requests Non-Posted Locked Read Transaction Protocol 33Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 34. ARCHITECTURE OVERVIEW Non – Posted Read Transactions for Locked Requests  The completer creates one or more locked completion TLP with data (CplDLk) along with a completion status.  Requesters uses a tag field in the completion to associate it with a request TLP of the same tag value it transmitted earlier.  Use of a tag in the request and completion TLPs allows a requester to manage multiple outstanding transactions.  If the completer is unable to obtain the requested data as a result of an error, it returns a completion packet without data (CplLk) and an error status indication within the packet. 34Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 35. ARCHITECTURE OVERVIEW Non – Posted Read Transactions for Locked Requests  The requester who receives the error notification via the CplLk TLP must assume that atomicity of the lock is no longer guaranteed and thus determine how to handle the error at the software layer.  The path from requester to completer remains locked until the requester at a later time transmits an unlock message to the completer.  The path and ingress/egress ports of a switch that the unlock message passes through are unlocked. 35Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 36. ARCHITECTURE OVERVIEW Non – Posted Write Transactions  Non-posted write request TLPs include IO write request (IOWr), configuration write request type 0 or type 1 (CfgWr0, CfgWr1) TLPs.  Memory write request and message requests are posted requests. Requesters may be a root complex or endpoint device (though not for configuration write requests).  The completer creates a single completion packet without data (Cpl) to confirm reception of the write request. 36Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 37. ARCHITECTURE OVERVIEW Non – Posted Write Transactions Non-Posted Write Transaction Protocol 37Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 38. ARCHITECTURE OVERVIEW Non – Posted Write Transactions  The requester gets confirmation notification that the write request did make its way successfully to the completer.  If the completer is unable to successfully write the data in the request to the final destination or if the write request packet reaches the completer in error, then it returns a completion packet without data (Cpl) but with an error status indication.  The requester who receives the error notification via the Cpl TLP determines how to handle the error at the software layer. 38Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 39. ARCHITECTURE OVERVIEW Posted Memory Write Transactions  If the write request is received by the completer in error, or is unable to write the posted write data to the final destination due to an internal error, the requester is not informed via the hardware protocol.  The completer could log an error and generate an error message notification to the root complex. Error handling software manages the error. 39Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 40. ARCHITECTURE OVERVIEW Posted Memory Write Transactions Posted Memory Write Transaction Protocol 40Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 41. ARCHITECTURE OVERVIEW Posted Message Transactions  There are two categories of message request TLPs, Msg and MsgD.  Some message requests propagate from requester to completer, some are broadcast requests from the root complex to all endpoints, some are transmitted by an endpoint to the root complex.  The completer accepts any data that may be contained in the packet (if the packet is MsgD) and/or performs the task specified by the message.  Message request support eliminates the need for side-band signals in a PCI Express system.  They are used for PCI style legacy interrupt signaling, power management protocol, error signaling, unlocking a path in the PCI Express fabric, slot power support, hot plug protocol, and vender defined purposes. 41Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 42. ARCHITECTURE OVERVIEW Posted Message Transactions Posted Message Transaction Protocol 42Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 43. ARCHITECTURE OVERVIEW Memory Read Originated by CPU, Targeting an Endpoint Non-Posted Memory Read Originated by CPU and Targeting an Endpoint 43Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 44. ADDRESS SPACES & TRANSACTIONS ROUTING 44Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 45. Introduction  Unlike shared-bus architectures such as PCI and PCI-X, where traffic is visible to each device and routing is mainly a concern of bridges, PCI Express devices are dependent on each other to accept traffic or forward it in the direction of the ultimate recipient.  As traffic arrives at the inbound side of a link interface (called the ingress port), the device checks for errors, then makes one of three decisions: 1) Accept the traffic and use it internally. 2) Forward the traffic to the appropriate outbound (egress) port. 3) Reject the traffic because it is neither the intended target nor an interface to it (note that there are also other reasons why traffic may be rejected) 45Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 46. Introduction Multi-Port PCI Express Devices Have Routing Responsibilities 46Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 47. Receivers Check For Three Types of Link Traffic  Assuming a link is fully operational, the physical layer receiver interface of each device is prepared to monitor the logical idle condition and detect the arrival of the three types of link traffic: 1) Ordered Sets 2) DLLPs 3) TLPs  Using control (K) symbols which accompany the traffic to determine framing boundaries and traffic type, PCI Express devices then make a distinction between traffic which is local to the link vs. traffic which may require routing to other links (e.g. TLPs).  Local link traffic, which includes Ordered Sets and Data Link Layer Packets (DLLPs), isn’t forwarded and carries no routing information.  Transaction Layer Packets (TLPs) can and do move from link to link, using routing information contained in the packet headers. 47Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 48. Multi-port Devices Assume the Routing Burden  It should be apparent in Figure that devices with multiple PCI Express ports are responsible for handling their own traffic as well as forwarding other traffic between ingress ports and any enabled egress ports. Also note that while peer-peer transaction support is required of switches, it is optional for a multi-port Root Complex. It is up to the system designer to account for peer-to-peer traffic when selecting devices and laying out a motherboard. 48Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 49. Endpoints Have Limited Routing Responsibilities  It should also be apparent in Figure that endpoint devices have a single link interface and lack the ability to route inbound traffic to other links. For this reason, and because they don’t reside on shared busses, endpoints never expect to see ingress port traffic which is not intended for them (this is different than shared-bus PCI(X), where devices commonly decode addresses and commands not targeting them). Endpoint routing is limited to accepting or rejecting transactions presented to them. 49Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 50. System Routing Strategy Is Programmed  Before transactions can be generated by a requester, accepted by the completer, and forwarded by any devices in the path between the two, all devices must be configured to enforce the system transaction routing scheme. Routing is based on traffic type, system memory and IO address assignments, etc. In keeping with PCI plug-and-play configuration methods, each PCI express device is discovered, memory and IO address resources are assigned to them, and switch/bridge devices are programmed to forward transactions on their behalf. Once routing is programmed, bus mastering and target address decoding are enabled. Thereafter, devices are prepared to generate, accept, forward, or reject transactions as necessary. 50Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 51. Two Types of Local Link Traffic  Local traffic occurs between the transmit interface of one device and the receive interface of its neighbour for the purpose of managing the link itself.  This traffic is never forwarded or flow controlled; when sent, it must be accepted.  Local traffic is further classified as Ordered Sets exchanged between the Physical Layers of two devices on a link or Data Link Layer packets (DLLPs) exchanged between the Data Link Layers of the two devices. 51Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 52. Ordered Sets  These are sent by each physical layer transmitter to the physical layer of the corresponding receiver to initiate link training, compensate for clock tolerance, or transition a link to and from the Electrical Idle state. As indicated in Table, there are five types of Ordered Sets. Each ordered set is constructed of 10-bit control (K) symbols that are created within the physical layer. 52Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 53. Ordered Sets Ordered Set Types 53Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 54. Ordered Sets PCI Express Link Local Traffic: Ordered Sets 54Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 55. P C I : Please Calm down Immediately We are taking a break 55Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 56. PACKET – BASED TRANSACTIONS 1Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 57. Introduction to the Packet- Based Protocol  With the exception of the logical idle indication and physical layer Ordered Sets, all information moves across an active PCI Express link in fundamental chunks called packets which are comprised of 10 bit control (K) and data (D) symbols. The two major classes of packets exchanged between two PCI Express devices are high level Transaction Layer Packets (TLPs), and low-level link maintenance packets called Data Link Layer Packets (DLLPs). Collectively, the various TLPs and DLLPs allow two devices to perform memory, IO, and Configuration Space transactions reliably and use messages to initiate power management events, generate interrupts, report errors, etc.  Figure depicts TLPs and DLLPs on a PCI Express link. 2Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 58. Why Use A Packet-Based Transaction Protocol TLP And DLLP Packets 3Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 59. Why Use A Packet-Based Transaction Protocol Packet Formats Are Well Defined  Each PCI Express packet has a known size and format, and the packet header- -positioned at the beginning of each DLLP and TLP packet-- indicates the packet type and presence of any optional fields. The size of each packet field is either fixed or defined by the packet type. The size of any data payload is conveyed in the TLP header Length field. Once a transfer commences, there are no early transaction terminations by the recipient. This structured packet format makes it possible to insert additional information into the packet into prescribed locations, including framing symbols, CRC, and a packet sequence number (TLPs only). 4Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 60. Framing Symbols Indicate Packet Boundaries  Each TLP and DLLP packet sent is framed with a Start and End control symbol, clearly defining the packet boundaries to the receiver. Note that the Start and End control (K) symbols appended to packets by the transmitting device are 10 bits each. A PCI Express receiver must properly decode a complete 10 bit symbol before concluding link activity is beginning or ending. Unexpected or unrecognized control symbols are handled as errors. 5Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 61. Transaction Layer Packets  High-level transactions originate at the device core of the transmitting device and terminate at the core of the receiving device. The Transaction Layer is the starting point in the assembly of outbound Transaction Layer Packets (TLPs), and the end point for disassembly of inbound TLPs at the receiver. Along the way, the Data Link Layer and Physical Layer of each device contribute to the packet assembly and disassembly. 6Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 62. Transaction Layer Packets PCI Express Layered Protocol And TLP Assembly/Disassembly 7Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 63. CRC Protects Entire Packet  Unlike the side-band parity signals used by PCI devices during the address and each data phase of a transaction, the in-band 16-bit or 32-bit PCI Express CRC value “protects” the entire packet (other than framing symbols). In addition to CRC, TLP packets also have a packet sequence number appended to them by the transmitter so that if an error is detected at the receiver, the specific packet(s) which were received in error may be resent. The transmitter maintains a copy of each TLP sent in a Retry Buffer until it is checked and acknowledged by the receiver. This TLP acknowledgement mechanism (sometimes referred to as the Ack/Nak protocol) forms the basis of link-level TLP error correction and is very important in deep topologies where devices may be many links away from the host in the event an error occurs and CPU intervention would otherwise be needed. 8Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 64. ACK / NAK PROTOCOL 9Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 65. Introduction  ‘Reliable’ transport of TLPs from one device to another device across the Link. The use of ACK DLLPs to confirm reception of TLPs and the use of NAK DLLPs to indicate error reception of TLPs is explained. 10Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 66. Reliable Transport of TLPs Across Each Link  The function of the Data Link Layer (shown in Figure 5-1 on page 210) is two fold: • ‘Reliable’ transport of TLPs from one device to another device across the Link. • The receiver’s Transaction Layer should receive TLPs in the same order that the transmitter sent them. The Data Link Layer must preserve this order despite any occurrence of errors that require TLPs to be replayed (retried). 11Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 67. Reliable Transport of TLPs Across Each Link Data Link Layer 12Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 68. Reliable Transport of TLPs Across Each Link Overview of the ACK/NAK Protocol 13Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 69. Elements of the ACK/NAK Protocol Packet order is maintained by the transmitter’s and receiver’s Transaction Layer. Elements of the ACK/NAK Protocol 14Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 70. QOS / TCS / VCS AND ARBITRATION 15Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 71. Quality of Service  Quality of Service (QoS) is a generic term that normally refers to the ability of a network or other entity (in our case, PCI Express) to provide predictable latency and bandwidth.  Note that QoS can only be supported when the system and device-specific software is PCI Express aware.  QoS can involve many elements of performance including: 1) Transmission rate 2) Effective Bandwidth 3) Latency 4) Error rate 5) Other parameters that affect performance 16Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 72. Quality of Service  Several features of PCI Express architecture provide the mechanisms that make QoS achievable. The PCI Express features that support QoS include: • Traffic Classes (TCs) • Virtual Channels (VCs) • Port Arbitration • Virtual Channel Arbitration • Link Flow Control  PCI Express uses these features to support two general classes of transactions that can benefit from the PCI Express implementation of QoS. 17Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 73. Quality of Service Isochronous Transactions  These transactions require a constant bus bandwidth at regular intervals along with guaranteed latency.  Isochronous transactions are most often used when a synchronous connection is required between two devices. Iso (same) + chronous (time) 18Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 74. Quality of Service Asynchronous Transactions  This class of transactions involves a wide variety of applications that have widely varying requirements for bandwidth and latency. QoS can provide the more demanding applications (those requiring higher bandwidth and shorter latencies) with higher priority than the less demanding applications. In this way, software can establish a hierarchy of traffic classes for transactions that permits differentiation of transaction priority based on their requirements. The specification refers to this capability as differentiated services. 19Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 75. Isochronous Transaction Support Synchronous Versus Isochronous Transactions Example Application of Isochronous Transaction 20Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 76. Isochronous Transaction Support Isochronous Transaction Management  Management of an isochronous communications channel is based on a Traffic Class (TC) value and an associated Virtual Channel (VC) number that software assigns during initialization. Hardware components including the Requester of a transaction and all devices in the path between the requester and completer are configured to transport the isochronous transactions from link to link via a hi-priority virtual channel. The requester initiates isochronous transactions that include a TC value representing the desired QoS. 21Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 77. Isochronous Transaction Management  The Requester injects isochronous packets into the fabric at the required rate (service interval), and all devices in the path between the Requester and Completer must be configured to support the transport of the isochronous transactions at the specified interval. Any intermediate device along the path must convert the TC to the associated VC used to control transaction arbitration. This arbitration results in the desired bandwidth and latency for transactions with the assigned TC. Note that the TC value remains constant for a given transaction while the VC number may change from link to link. 22Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 78. Isochronous Transaction Support Differentiated Services  Various types of asynchronous traffic (all traffic other than isochronous) have different priority from the system perspective.  For example, ethernet traffic requires higher priority (smaller latencies) than mass storage transactions.  PCI Express software can establish different TC values and associated virtual channels and can set up the communications paths to ensure different delivery policies are established as required.  Note that the specification does not define specific methods for identifying delivery requirements or the policies to be used when setting up differentiated services. 23Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 79. Perspective on QOS/TC/VC and Arbitration Traffic Classes and Virtual Channels  During initialization a PCI Express device-driver communicates the levels of QoS that it desires for its transactions, and the operating system returns TC values that correspond to the QoS requested.  The TC value ultimately determines the relative priority of a given transaction as it traverses the PCI Express fabric. 24Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 80. FLOW CONTROL 25Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 81. FLOW CONTROL Introduction  Flow control guarantees that transmitters will never send Transaction Layer Packets (TLPs) that the receiver can’t accept.  This prevents receive buffer over-runs and eliminates the need for inefficient disconnects, retries, and wait-states on the link. 26Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 82. Flow Control Concept  Before a transaction packet can be sent across a link to the receiving port, the transmitting port must verify that the receiving port has sufficient buffer space to accept the transaction to be sent.  If the transaction is rejected due to insufficient buffer space, the transaction is resent (retried) until the transaction completes.  This procedure can severely reduce the efficiency of a bus, by wasting bus bandwidth when other transactions are ready to be sent.  The Flow Control mechanism would be ineffective, if only one transaction stream was pending transmission across a link.  PCI Express improves link efficiency by implementing multiple flow-control buffers for separate transaction streams. 27Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 83. Flow Control Concept  If the Flow Control buffer for one VC is full, the transmitter can advance to another VC buffer and send transactions associated with it.  The link Flow Control mechanism uses a credit-based mechanism that allows the transmitting port to check buffer space availability at the receiving port.  During initialization each receiver reports the size of its receive buffers (in Flow Control credits) to the port at the opposite end of the link.  The receiving port continues to update the transmitting port regularly by transmitting the number of credits that have been freed up.  This is accomplished via Flow Control DLLPs. 28Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 84. Flow Control Concept Location of Flow Control Logic 29Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 85. INTERRUPTS 30Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 86. Two Methods of Interrupt Delivery  When a native PCI Express function does depend upon delivering interrupts to call its device driver, Message Signaled Interrupts (MSI) must be used.  However, in the event that a device connecting to a PCI Express link cannot use MSIs (i.e., legacy devices), an alternate mechanism is defined. 31Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 87. Two Methods of Interrupt Delivery Native PCI Express Interrupt Delivery  A Message Signalled Interrupt is not a PCI Express Message, instead it is simply a Memory Write transaction.  A memory write associated with an MSI can only be distinguished from other memory writes by the address locations they target, which are reserved by the system for Interrupt delivery. 32Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 88. Two Methods of Interrupt Delivery Legacy PCI Interrupt Delivery  Legacy functions use one of the interrupt lines to signal an interrupt.  An INTx# signal is asserted to request interrupt service and deasserted when the interrupt service accesses a device-specific register, thereby indicating the interrupt is being serviced.  PCI Express defines in-band messages that act as virtual INTx# wires, which target the interrupt controller located typically within the Root Complex. 33Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 89. Two Methods of Interrupt Delivery Native PCI Express and Legacy PCI Interrupt Delivery 34Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 90. Message Signaled Interrupts  Message Signaled Interrupts (MSIs) are delivered to the Root Complex via memory write transactions. The MSI Capability register provides all the information that the device requires to signal MSIs. This register is set up by configuration software and includes the following information: • Target memory address • Data Value to be written to the specified address location • The number of messages that can be encoded into the data 35Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 91. Message Signaled Interrupts 64-bit MSI Capability Register Format 36Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 92. Message Signaled Interrupts 32-bit MSI Capability Register Set Format 37Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 93. ERROR DETECTION AND HANDLING 38Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 94. ERROR DETECTION AND HANDLING Background  The original PCI bus implementation provides for basic parity checks on each transaction as it passes between two devices residing on the same bus.  The PCI architecture provides a method for reporting the following types of errors: • data parity errors • data parity errors during multicast transactions (special cycles) • address and command parity errors • other types of errors 39Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 95. Background  Errors reported via PERR# are considered potentially recoverable.  How the errors reported via PERR# are handled is left up to the implementer.  Error handling may involve only hardware, device-specific software, or system software.  Errors signaled via SERR# are reported to the system and handled by system software.  PCI-X uses the same error reporting signals as PCI, but defines specific error handling requirements depending on whether device-specific error handling software is present. 40Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 96. Introduction to PCI Express Error Management PCI Express defines a variety of mechanisms used for checking errors, reporting those errors and identifying the appropriate hardware and software elements for handling these errors. 41Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 97. Introduction to PCI Express Error Management PCI Express Error Checking Mechanisms Each layer of the PCI Express interface includes error checking capability as described in the following sections. 42Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 98. PCI Express Error Checking Mechanisms Transaction Layer Errors  The transaction layer checks are performed only by the Requestor and Completer. Packets traversing switches do not perform any transaction layer checks. Checks performed at the transaction layer include: • ECRC check failure (optional check based on end-to-end CRC) • Malformed TLP (error in packet format) • Completion Time-outs during split transactions • Flow Control Protocol errors (optional) • Unsupported Requests • Data Corruption (reported as a poisoned packet) • Completer Abort (optional) • Unexpected Completion (completion does not match any Request pending completion) • Receiver Overflow (optional check) 43Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 99. PCI Express Error Checking Mechanisms Data Link Layer Errors  Link layer error checks occur within a device involved in delivering the transaction between the requester and completer functions. This includes the requesting device, intermediate switches, and the completing device. Checks performed at the link layer include: • LCRC check failure for TLPs • Sequence Number check for TLP s • LCRC check failure for DLLPs • Replay Time-out • Replay Number Rollover • Data Link Layer Protocol errors 44Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 100. PCI Express Error Checking Mechanisms Physical Layer Errors  Physical layer error checks are also performed by all devices involved in delivering the transaction, including the requesting device, intermediate switches, and the completing device. Checks performed at the physical layer include: • Receiver errors (optional) • Training errors (optional) 45Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 101. Error Reporting Mechanisms  PCI Express provides three mechanisms for establishing the error reporting policy.  These mechanisms are controlled and reported through configuration registers mapped into three distinct regions of configuration space. • PCI-compatible Registers (required) — 1) This error reporting mechanism provides backward compatibility with existing PCI compatible software and is enabled via the PCI configuration Command Register. 2) This approach requires that PCI Express errors be mapped to PCI compatible error registers. 46Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 102. Error Reporting Mechanisms • PCI Express Capability Registers (required) — 1) This mechanism is available only to software that has knowledge of PCI Express. 2) This required error reporting is enabled via the PCI Express Device Control Register mapped within PCI-compatible configuration space. • PCI Express Advanced Error Reporting Registers (optional) — 1) This mechanism involves registers mapped into the extended configuration address space. 2) PCI Express compatible software enables error reporting for individual errors via the Error Mask Register. 47Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 103. Error Reporting Mechanisms Location of PCI Express Error-Related Configuration Registers 48Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 104. Error Handling Mechanisms • Correctable errors — handled by hardware • Uncorrectable errors-nonfatal — handled by device-specific software • Uncorrectable errors-fatal — handled by system software 49Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 105. PHYSICAL LAYER LOGIC 50Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 106. PHYSICAL LAYER LOGIC  Byte Striping and Un-Striping logic, Scrambler and De- Scrambler, 8b/10b Encoder and Decoder, Elastic Buffers and more. The transmit logic of the Physical Layer essentially processes packets arriving from the Data Link Layer, then converts them into a serial bit stream. 51Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 107. Physical Layer Overview Physical Layer 52Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 108. Physical Layer Overview Logical and Electrical Sub-Blocks of the Physical Layer 53Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 109. Transmit Logic Overview Physical Layer Details 54Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 110. Transmit Logic Overview  Figure shows the elements that make up the transmit logic: • a multiplexer (mux), • byte striping logic (only necessary if the link implements more than one data lane), • scramblers, • 8b/10b encoders, • and parallel-to-serial converters.  TLPs and DLLPs from the Data Link layer are clocked into a Tx (transmit) Buffer.  With the aid of a multiplexer, the Physical Layer frames the TLPs or DLLPs with Start and End characters. 55Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 111. Transmit Logic Overview  These characters are framing symbols which the receiver device uses to detect start and end of packet. The framed packet is sent to the Byte Striping logic which multiplexes the bytes of the packet onto the Lanes. One byte of the packet is transferred on one Lane, the next byte on the next Lane and so on for the available Lanes. The Scrambler uses an algorithm to pseudo-randomly scramble each byte of the packet. The Start and End framing bytes are not scrambled. 56Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 112. Transmit Logic Overview  Scrambling eliminates repetitive patterns in the bit stream. Repetitive patterns result in large amounts of energy concentrated in discrete frequencies which leads to significant EMI noise generation. Scrambling spreads energy over a frequency range, hence minimizing average EMI noise generated. The scrambled 8-bit characters (8b characters) are encoded into 10- bit symbols (10b symbols) by the 8b/10b Encoder logic. And yes, there is a 25% loss in transmission performance due to the expansion of each byte into a 10-bit character. 57Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 113. Transmit Logic Overview A Character is defined as the 8-bit un-encoded byte of a packet. A Symbol is defined as the 10-bit encoded equivalent of the 8-bit character. The 10b symbols are converted to a serial bit stream by the Parallel- to-Serial converter. This logic uses a 2.5 GHz clock to serially clock the packets out on each Lane. The serial bit stream is sent to the electrical sub-block which differentially transmits the packet onto each Lane of the Link. 58Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 114. Receive Logic Overview  Figure shows the elements that make up the receiver logic: • receive PLL, • serial-to-parallel converter, • elastic buffer, • 8b/10b decoder, • de-scrambler, • byte un-striping logic (only necessary if the link implements more than one data lane), • control character removal circuit, • and a packet receive buffer. As the data bit stream is received, the receiver PLL is synchronized to the clock frequency with which the packet was clocked out of the remote transmitter device. 59Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 115. Receive Logic Overview  The transitions in the incoming serial bit stream are used to re- synchronize the PPL circuitry and maintain bit and symbol lock while generating a clock recovered from the data bit stream.  The serial-to-parallel converter is clocked by the recovered clock and outputs 10b symbols.  The 10b symbols are clocked into the Elastic Buffer using the recovered clock associated with the receiver PLL.  The 10b symbols are converted back to 8b characters by the 8b/10b Decoder.  The Start and End characters that frame a packet are eliminated.60Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 116. Receive Logic Overview  The 8b/10b Decoder also looks for errors in the incoming 10b symbols. For example, error detection logic can check for invalid 10b symbols or detect a missing Start or End character. The De-Scrambler reproduces the de-scrambled packet stream from the incoming scrambled packet stream. The De-Scrambler implements the inverse of the algorithm implemented in the transmitter Scrambler. The bytes from each Lane are un-striped to form a serial byte stream that is loaded into the receive buffer to feed to the Data Link layer.61Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 117. ELECTRICAL PHYSICAL LAYER 62Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 118. Electrical Physical Layer Overview  This sub-block contains differential drivers (transmitters) and differential receivers (receivers).  The transmitter serializes outbound symbols on each Lane and converts the bit stream to electrical signals that have an embedded clock.  The receiver detects electrical signaling on each Lane and generates a serial bit stream that it de-serializes into symbols, and supplies the symbol stream to the logical Physical Layer along with the clock recovered from the inbound serial bit stream.  The drivers and receivers are short-circuit tolerant, making them ideally suited for hot insertion and removal events. 63Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 119. Electrical Physical Layer Overview Electrical Sub-Block of the Physical Layer 64Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 120. ESD and Short Circuit Requirements  All signals and power pins must withstand (without damage) a 2000V Electro-Static Discharge (ESD) using the human body model and 500V using the charged device model. The ESD requirement not only protects against electro-static damage, but facilitates support of surprise hot insertion and removal events. Transmitters and receivers are also required to be short-circuit tolerant. 65Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 121. Khatam!!! Next session is how to use all this stuff we were trying to learn_____ 66Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd.
  • 122. Slide 1Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Getting to PCI Express But first some Background
  • 123. Slide 2Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Interfacing Processor and Peripherals  Overview Main memory I/O controller I/O controller I/O controller Disk Graphics output Network MemoryĞI/O bus Processor Cache Interrupts Disk
  • 124. Slide 3Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Introduction  I/O often viewed as second class to processor design – Processor research is cleaner – System performance given in terms of processor – Courses often ignore peripherals – Writing device drivers is not fun  This is crazy - a computer with no I/O is pointless
  • 125. Slide 4Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Peripheral design  As with processors, characteristics of I/O driven by technology advances – E.g. properties of disk drives affect how they should be connected to the processor – PCs and super computers now share the same architectures, so I/O can make all the difference  Different requirements from processors – Performance – Expandability – Resilience
  • 126. Slide 5Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Peripheral performance  Harder to measure than for the processor – Device characteristics • Latency / Throughput – Connection between system and device – Memory hierarchy – Operating System  Assume 100 secs to execute a benchmark – 90 secs CPU and 10 secs I/O – If processors get 50% faster per year for the next 5 years, what is the impact?
  • 127. Slide 6Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Relative performance  CPU time + IO time = total time (% of IO time)  Year 0: 90 + 10 = 100 (10%)  Year 1: 60 + 10 = 70 (14%)  :  Year 5: 12 + 10 = 22 (45%)  !
  • 128. Slide 7Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd IO bandwidth  Measured in 2 ways depending on application – How much data can we move through the system in a given time • Important for supercomputers with large amounts of data for, say, weather prediction – How many IO operations can we do in a given time • ATM is small amount of data but need to be handled rapidly  So comparison is hard. Generally – Response time lowered by handling early – Throughput increased by handling multiple requests together
  • 129. Slide 8Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd I/O Performance Measures  I/O bandwidth (throughput) – amount of information that can be input (output) and communicated across an interconnect (e.g., a bus) to the processor/memory (I/O device) per unit time 1. How much data can we move through the system in a certain time? 2. How many I/O operations can we do per unit time?  I/O response time (latency) – the total elapsed time to accomplish an input or output operation – An especially important metric in real-time systems  Many applications require both high throughput and short response times
  • 130. Slide 9Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd I/O System Performance  Designing an I/O system to meet a set of bandwidth and/or latency constraints means Finding the weakest link in the I/O system – the component that constrains the design – The processor and memory system – The underlying interconnection (e.g., bus) – The I/O controllers – The I/O devices themselves (Re)configuring the weakest link to meet the bandwidth and/or latency requirements Determining requirements for the rest of the components and (re)configuring them to support this latency and/or bandwidth
  • 131. Slide 10Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd I/O System Performance Example  A disk workload consisting of 64KB reads and writes where the user program executes 200,000 instructions per disk I/O operation and – a processor that sustains 3 billion instr/s and averages 100,000 OS instructions to handle an I/O operation – a memory-I/O bus that sustains a transfer rate of 1000 MB/s The maximum I/O rate of the processor is -------------------------- = ------------------------ = 10,000 I/Os/sec Instr execution rate 3 x 109 Instr per I/O (200 + 100) x 103 Each I/O reads/writes 64 KB so the maximum I/O rate of the bus is ---------------------- = ----------------- = 15,625 I/O’s/sec Bus bandwidth 1000 x 106 Bytes per I/O 64 x 103
  • 132. Slide 11Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Input and Output Devices  I/O devices are incredibly diverse with respect to – Behavior – input, output or storage – Partner – human or machine – Data rate – the peak rate at which data can be transferred between the I/O device and the main memory or processor Device Behavior Partner Data rate (Mb/sec) Keyboard input human 0.0001 Mouse input human 0.0038 Laser printer output human 3.2000 Graphics display output human 800.0000- 8000.0000 Network input or output machine 100.0000- 1000.0000 Magnetic disk storage machine 240.0000- 2560.0000 8ordersofmagnitude range
  • 133. Slide 12Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Mouse  Communicates with – Pulses from LED – Increment / decrement counters  Mice have at least 1 button – Need click and hold  Movement is smooth, slower than processor – Polling – No submarining – Software configuration Initial position of mouse +20 in XĞ20 in X +20 in Y +20 in Y +20 in X +20 in Y Ğ20 in X Ğ20 in Y Ğ20 in Y +20 in X Ğ20 in Y Ğ20 in X
  • 134. Slide 13Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Mouse guts QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. QuickTime™ and a TIFF(Uncompressed) decompressor are needed tosee this picture.
  • 135. Slide 14Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Hard disk  Rotating rigid platters with magnetic surfaces  Data read/written via head on armature – Think record player  Storage is non-volatile  Surface divided into tracks – Several thousand concentric circles  Track divided in sectors – 128 or so sectors per track
  • 136. Slide 15Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Diagram Platter Track Platters Sectors Tracks
  • 137. Slide 16Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Access time  Three parts 1. Perform a seek to position arm over correct track 2. Wait until desired sector passes under head. Called rotational latency or delay 3. Transfer time to read information off disk – Usually a sector at a time at 2~4 Mb / sec – Control is handled by a disk controller, which can add its own delays.
  • 138. Slide 17Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Calculating time  Seek time: – Measure max and divide by two – More formally: (sum of all possible seeks)/number of possible seeks  Latency time: – Average of complete spin – 0.5 rotations / spin speed (3600~5400 rpm) – 0.5/ 3600 / 60 – 0.00083 secs – 8.3 ms
  • 139. Slide 18Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Comparison  Currently, 7.25 Gb (7,424,000) per inch squared QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.
  • 140. Slide 19Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd More faking  Disk drive hides internal optimisations from external world QuickTime™ anda TIFF (Uncompressed) decompressor are needed to see this picture.
  • 141. Slide 20Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Disk Latency & Bandwidth Milestones Disk latency is one average seek time plus the rotational latency. Disk bandwidth is the peak transfer time of formatted data from the media (not from the cache). In the time that the disk bandwidth doubles the latency improves by a factor of only 1.2 to 1.4 CDC Wren SG ST41 SG ST15 SG ST39 SG ST37 RSpeed (RPM) 3600 5400 7200 10000 15000 Year 1983 1990 1994 1998 2003 Capacity (Gbytes) 0.03 1.4 4.3 9.1 73.4 Diameter (inches) 5.25 5.25 3.5 3.0 2.5 Interface ST-412 SCSI SCSI SCSI SCSI Bandwidth (MB/s) 0.6 4 9 24 86 Latency (msec) 48.3 17.1 12.7 8.8 5.7
  • 142. Slide 21Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Media Bandwidth/Latency Demands  Bandwidth requirements – High quality video • Digital data = (30 frames/s) × (640 x 480 pixels) × (24-b color/pixel) = 221 Mb/s (27.625 MB/s) – High quality audio • Digital data = (44,100 audio samples/s) × (16-b audio samples) × (2 audio channels for stereo) = 1.4 Mb/s (0.175 MB/s) – Compression reduces the bandwidth requirements considerably  Latency issues – How sensitive is your eye (ear) to variations in video (audio) rates? – How can you ensure a constant rate of delivery? – How important is synchronizing the audio and video streams? • 15 to 20 ms early to 30 to 40 ms late tolerable
  • 143. Slide 22Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Magnetic Disk Examples (www.seagate.com) Characteristic Seagate ST37 Seagate ST32 Seagate ST94 Disk diameter (inches) 3.5 3.5 2.5 Capacity (GB) 73.4 200 40 # of surfaces (heads) 8 4 2 Rotation speed (RPM) 15,000 7,200 5,400 Transfer rate (MB/sec) 57-86 32-58 34 Minimum seek (ms) 0.2r-0.4w 1.0r-1.2w 1.5r-2.0w Average seek (ms) 3.6r-3.9w 8.5r-9.5w 12r-14w MTTF (hours@25o C) 1,200,000 600,000 330,000 Dimensions (inches) 1”x4”x5.8” 1”x4”x5.8” 0.4”x2.7”x3.9” GB/cu.inch 3 9 10 Power: op/idle/sb (watts) 20?/12/- 12/8/1 2.4/1/0.4 GB/watt 4 16 17 Weight (pounds) 1.9 1.4 0.2
  • 144. Slide 23Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Buses: Connecting I/O devices  Interfacing subsystems in a computer system is commonly done with a bus: “a shared communication link, which uses one set of wires to connect multiple sub-systems”
  • 145. Slide 24Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Why a bus?  Main benefits: – Versatility: new devices easily added – Low cost: reusing a single set of wires many ways  Problems: – Creates a bottleneck – Tries to be all things to all subsystems  Comprised of – Control lines: signal requests, acknowledgements and to show what type of information is on the – Data lines:data, destination / source address
  • 146. Slide 25Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Controlling a bus  As the bus is shared, need a protocol to manage usage  Bus transaction consists of – Sending the address – Sending / receiving the data  Note than in buses, we talk about what the bus does to memory – During a read, a bus will ‘receive’ data
  • 147. Slide 26Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Bus transaction 1 - disk write Memory Processor Control lines Data lines Disks Memory Processor Control lines Data lines Disks Processor Control lines Data lines Disks a. b. c. Memory
  • 148. Slide 27Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Bus transaction 2 - disk read Memory Processor Control lines Data lines Disks Processor Control lines Data lines Disks a. b. Memory
  • 149. Slide 28Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Types of Bus  Processor-memory bus – Short and high speed – Matched to memory system (usually Proprietary)  I/O buses – Lengthy, – Connected to a wide range of devices – Usually connected to the processor using 1 or 3  Backplane bus – Processors, memory and devices on single bus – Has to balance proc-memory with I/O-memory – Usually requires extra logic to do this
  • 150. Slide 29Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Bus type diagram Processor Memory Backplane bus a. I/O devices Processor Memory Processor-memory bus b. Bus adapter Bus adapter I/O bus I/O bus Bus adapter I/O bus Processor Memory Processor-memory bus c. Bus adapter Backplane bus Bus adapter I/O bus Bus adapter I/O bus
  • 151. Slide 30Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Synchronous and Asynchronous buses  Synchronous bus has a clock attached to the control lines and a fixed protocol for communicating that is relative to the pulse  Advantages – Easy to implement (CC1 read, CC5 return value) – Requires little logic (FSM to specify)  Disadvantages – All devices must run at same rate – If fast, cannot be long due to clock skew  Most proc-mem buses are clocked
  • 152. Slide 31Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Asynchronous buses  No clock, so it can accommodate a variety of devices (no clock = no skew)  Needs a handshaking protocol to coordinate different devices – Agreed steps to progress through by sender and receiver – Harder to implement - needs more control lines
  • 153. Slide 32Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Example handshake - device wants a word from memory DataRdy Ack Data ReadReq 1 3 4 5 7 6 42 2
  • 154. Slide 33Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd FSM control 1 Record from data lines and assert Ack ReadReq ReadReq ________ ReadReq ReadReq 3, 4 Drop Ack; put memory data on data lines; assert DataRdy Ack Ack 6 Release data lines and DataRdy ________ ___ Memory 2 Release data lines; deassert ReadReq Ack DataRdy DataRdy 5 Read memory data from data lines; assert Ack DataRdy DataRdy 7 Deassert Ack I/O device Put address on data lines; assert ReadReq ________ Ack ___ ________ New I/O request New I/O request
  • 155. Slide 34Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Increasing bus bandwidth  Key factors – Data bus width: Wider = fewer cycles for transfer – Separate vs Multiplexed, data and address lines • Separating allows transfer in one bus cycle – Block transfer: Transfer multiple blocks of data in consecutive cycles without resending addresses and control signals etc.
  • 156. Slide 35Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Obtaining bus access  Need one, or more, bus masters to prevent chaos  Processor is always a bus master as it needs to access memory – Memory is always a slave  Simplest system as a single master (CPU)  Problems – Every transfer needs CPU time – As peripherals become smarter, this is a waste of time  But, multiple masters can cause problems
  • 157. Slide 36Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Bus Arbitration  Deciding which master gets to go next – Master issues ‘bus request’ and awaits ‘granted’  Two key properties – Bus priority (highest first) – Bus fairness (even the lowest get a go, eventually)  Arbitration is an overhead, so good to reduce it – Dedicated lines, grant lines, release lines etc.
  • 158. Slide 37Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Different arbitration schemes  Daisy chain: Bus grant line runs through devices from highest to lowest  Very simple, but cannot guarantee fairness Device n Lowest priority Device 2Device 1 Highest priority Bus arbiter Grant Grant Grant Release Request
  • 159. Slide 38Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Centralised Arbitration  Centralised, parallel: All devices have separate connections to the bus arbiter – This is how the PCI backplane bus works (found in most PCs) – Can guarantee fairness – Arbiter can become congested
  • 160. Slide 39Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Distributed  Distributed arbitration by self selection:  Each device contains information about relative importance  A device places its ID on the bus when it wants access  If there is a conflict, the lower priority devices back down  Requires separate lines and complex devices  Used on the Macintosh II series (NuBus)
  • 161. Slide 40Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Collision detection  Distributed arbitration by collision detection:  Basically ethernet  Everyone tries to grab the bus at once  If there is a ‘collision’ everyone backs off a random amount of time
  • 162. Slide 41Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Bus standards  To ensure machine expansion and peripheral re-use, there are various standard buses – IBM PC-AT bus (de-facto standard) – SCSI (needs controller) – PCI (Started as Intel, now IEEE) – Ethernet  Bus bandwidth depends on size of transfer and memory speed
  • 163. Slide 42Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd PCI Type Backplane Data width 32-64 Address/data Multiplexed Bus masters Multiple Arbitration Central parallel Clocking Synch. 33-66 Mhz Theoretical Peak 133-512 MB/sec Achievable peak 80 MB/sec Max devices 1024 Max length 50 cm Bananas none
  • 164. Slide 43Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd My Old Macintosh Main memory I/O controller I/O controller Graphics output PCI CDROM Disk Tape I/O controller Stereo I/O controller Serial ports I/O controller Apple desktop bus Processor PCI interface/ memory controller EthernetSCSI bus outputinput
  • 165. Slide 44Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Example: The Pentium 4’s Buses System Bus (“Front Side Bus”): 64b x 800 MHz (6.4GB/s), 533 MHz, 400 MHz 2 serial ATAs: 150 MB/s 8 USBs: 60 MB/s 2 parallel ATA: 100 MB/s Hub Bus: 8b x 266 MHz Memory Controller Hub (“Northbridge”) I/O Controller Hub (“Southbridge”) Gbit ethernet: 0.266 GB/s DDR SDRAM Main Memory Graphics output: 2.0 GB/s PCI: 32b x 33 MHz
  • 166. Slide 45Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Buses in Transition Companies are transitioning from synchronous, parallel, wide buses to asynchronous narrow buses – Reflection on wires and clock skew makes it difficult to (synchronously) use 16 to 64 parallel wires running at a high clock rate (e.g., ~400 MHz) so companies are transitioning to buses with a few one-way, asynchronous wires running at a very high clock rate (~2 GHz) PCI PCIexpres s ATA Serial ATA Total # wires 120 36 80 7 # data wires 32 – 64 (2-way) 2 x 4 (1-way) 16 (2-way) 2 x 2 (1-way) Clock (MHz) 33 – 133 635 50 150 Peak BW (MB/s) 128 – 1064 300 100 150
  • 167. Slide 46Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd ATA Cable Sizes  Serial ATA cables (red) are much thinner than parallel ATA cables (green)
  • 168. Slide 47Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Giving commands to I/O devices  Processor must be able to address a device – Memory mapping: portions of memory are allocated to a device (Base address on a PC) • Different addresses in the space mean different things • Could be a read, write or device status address – Special instructions: Machine code for specific devices • Not a good idea generally
  • 169. Slide 48Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Communicating with the Processor  Polling – Process of periodically checking the status bits to see if it is time for the next I/O operation – Simplest way for device to communicate (via a shared status register – Mouse – Wasteful of processor time
  • 170. Slide 49Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Interrupts  Notify processor when a device needs attention (IRQ lines on a PC)  Just like exceptions, except for – Interrupt is asynchronous with program execution • Control unit only checks I/O interrupt at the start of each instruction execution – Need further information, such as the identity of the device that caused the interrupt and its priority • Remember the Cause Register?
  • 171. Slide 50Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Interrupt-Driven I/O  With I/O interrupts – Need a way to identify the device generating the interrupt – Can have different urgencies (so may need to be prioritized)  Advantages of using interrupts – Relieves the processor from having to continuously poll for an I/O event; user program progress is only suspended during the actual transfer of I/O data to/from user memory space  Disadvantage – special hardware is needed to – Cause an interrupt (I/O device) and detect an interrupt and save the necessary information to resume normal processing after servicing the interrupt (processor)
  • 172. Slide 52Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Interrupt-Driven Input memory user program 1. input interrupt 2.1 save state Processor ReceiverMemory add sub and or beq lbu sb ... jr 2.2 jump to interrupt service routine 2.4 return to user code Keyboard 2.3 service interrupt input interrupt service routine
  • 173. Slide 53Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Interrupt-Driven Output Processor TrnsmttrMemory Display add sub and or beq lbu sb ... jr memory user program 1.output interrupt 2.1 save state output interrupt service routine 2.2 jump to interrupt service routine 2.4 return to user code 2.3 service interrupt
  • 174. Slide 54Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Transferring Data Between Device and Memory  We can do this with Interrupts and Polling – Works best with low bandwidth devices and keeping cost of controller and interface – Burden lies with the processor  For high bandwidth devices, we don’t want the processor worrying about every single block  Need a scheme for high bandwidth autonomous transfers
  • 175. Slide 55Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Direct Memory Access (DMA)  Mechanism for offloading the processor and having the device controller transfer data directly  Still uses interrupt mechanism, but only to communicate completion of transfer or error  Requires dedicated controller to conduct the transfer
  • 176. Slide 56Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Doing DMA  Essentially, DMA controller becomes bus master and sets up the transfer  Three steps – Processor sets up the DMA by supplying • device identity • Operation on device • Memory Address (source or destination) • Amount to transfer – DMA operates devices, supplies addresses and arbitrates bus – On completion, controller notifies processor
  • 177. Slide 57Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd DMA and the memory system  With DMA, the relationship between memory and processor is changed – DMA bypasses address translation and hierarchy  So, should DMA use virtual or physical addresses? – Virtual addresses: DMA must translate – Physical addresses: Hard to cross page boundary
  • 178. Slide 58Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd DMA address translation  Can provide the DMA with a small address translation table for pages - provided by OS at transfer time  Get the OS to break the transfer into chunks, each chunk relating to a single page  Regardless, the OS cannot relocate pages during transfer
  • 179. Slide 59Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd Evolution vs Revolution  Evolutionary approaches tend to be invisible to users except for – Lower cost and better performance  Revolutionary approaches require new languages and applications – Looks good on paper – Must be worth the effort – KCM
  • 180. Slide 60Subhash Iyer, Program Head, Soft Polynomials (I) Pvt. Ltd BUSSSS!!!!! Now let’s dive into the actual scenario