SlideShare une entreprise Scribd logo
1  sur  47
Télécharger pour lire hors ligne
OpenSM
Design Overview
August 2002

*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

1
Infiniband Management
The Infiniband architecture requires a “subnet
manager” node to configure and maintain an
Infiniband fabric.
The subnet manager actually consists of several
modules. The term “subnet manager” is
commonly used to refer to the node that provides
both a Subnet Manager (SM) and the Subnet
Administrator (SA).
The SM configures subnet addresses and switch
routing tables.
The SA serves as the subnet’s database, hosting
queries from other subnet nodes.
*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

2
OpenSM Features
Open source
Object oriented, extensible design implemented in C.
Embedded documentation with Robodoc.
Designed for portability to other platforms and Infiniband
interfaces. Current implementation runs in Linux userspace on top of the open source AL API.
Supports major SM features: multipathing (LMC > 0),
partitions, multicast groups, SM fail-over.
Guarantees optimal routing between any two end-points
regardless of subnet topology.
Supports commonly used SA queries.
Complete logging facilities to the MAD frame level.
*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

3
Unsupported Features
The current developers do not plan to implement
the following features. These capabilities could
be added by other interested developers.
–
–
–
–
–

Switches with random forwarding tables.
Subnet routers.
Complete set of SA queries.
Virtual lanes.
GUI

*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

4
The OpenSM Object Model

*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

5
The OpenSM Object Model
OpenSM models most of its internal data
structures on the physical components in an IB
fabric, such as nodes, switches, partitions, etc.
OpenSM objects are linked together to model the
physical relationships on the IB subnet. For
example, a switch object contains port objects.
Port objects point to other port objects.
Most OpenSM objects provide functions that hide
the details of the object’s implementation.

*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

6
The Component Library
The Component Library is a portable, open source
library of useful functions and “containers”
implemented in C.
The Component Library provides lists, balanced
trees, variable sized arrays, etc.
OpenSM makes heavy use of the component
library. In the source, you will notice use of
functions starting with: cl_. These are calls to
the Component Library.
When in doubt, refer to the component library
documentation!

*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

7
OpenSM Objects: A Top Down View
The “highest order” OpenSM object is the
OpenSM object itself.
The OpenSM object spans exactly one IB subnet.
The OpenSM object is built from a variety of other
objects:
–
–
–
–

One Subnet object
One SM object
One SA object
A Log object & many other supporting objects

The following slides will examine each of these
components in detail.
*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

8
The Subnet Object
Name: osm_subn_t
Defined in: osm_subnet.h

The subnet object models the physical
composition of the subnet.
OpenSM builds the subnet object during
discovery and maintains it during subsequent
sweeps.
The subnet object is built from many other object
types, for example:
–
–
–
–
–

Node Objects – represent physical nodes
Switch Objects – represent physical switches
Router Objects – represent physical routers
Partition Objects – represent logical partitions
Multicast Group Objects – represent multicast groups.

*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

9
The Subnet Object

N
N

R
R

S
S

S
S

S
S

R
R

S
S
N
N

N
N
Master

N
N

Node

S

N
N

N
N
N
N

N

Switch

R

Router
Port / Link

N
N
N
N

This view shows
the internal
object-to-object
linkage that
models the
physical subnet.

*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

10
Subnet Pointer Tables
The subnet object contains tables of pointers of each type
of object in the subnet.
The pointer tables provide quick access to objects of a
particular type, be it a switch, node port, partition, etc.
Where appropriate, two pointer tables are provided for each
type of object:
– LidTable – pointers sorted by unicast LID
– GuidTable – pointers sorted by GUID

The pointer tables sorted by GUID are implemented using
cl_map (red-black trees).
The pointer tables sorted by LID are implemented using
cl_ptr_vector (variable size array of pointers).

*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

11
Pointer Table Example - Nodes
Lid
1
2
3
…

Ptr

N
N
N
N

…

N
N
GUID
0x123
0x456
0x789
…

N
N

Ptr

S
S

S
S

S
S

R
R
…

R
R

S
S
N
N

N
N

N
N
N
N
N
N

Master

*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

12
How does LMC > 0 look?
Lid
1
2
3
…
GUID
0x123
0x456
0x789
…

Ptr

N
N

N
N
N
N

…
Ptr

N
N

S
S

S
S

S
S

R
R
…

R
R

S
S
N
N

N
N

N
N
N
N
N
N

Master

LMC > 0 is not a special case!
*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

13
Node Object

Name: osm_node_t
Defined in: osm_node.h
A node object represents a node on the subnet.
A node may be a Host Channel Adapter (HCA), a
Target Channel Adapter (TCA), a Switch or a
Router.
A node object is built from other object types:
– An array of Physical Port objects
– NodeInfo attribute
– NodeDescription attribute

Nodes “own” their corresponding Physical Port
objects. All other references to a Physical Port
point to a Physical Port inside a Node.
*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

14
Node Objects

Node Object
Node Object

Node Object
Node Object

NodeInfo
Attribute

NodeInfo
Attribute

NodeDescription
Attribute

NodeDescription
Attribute

Physical Port 0
Physical Port 1
…
Physical Port N

Physical Port 0
Physical Port 1
…
Physical Port N

Node Objects are linked to their
neighbors though their Physical Port
Objects.
*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

15
OpenSM Port Object Model
The object model for ports is somewhat complicated.
In switches, all ports share a common port GUID value, and
are thus logically grouped together. In all other node types,
ports have a unique GUID value.
The OpenSM object model handles this by creating two
types of port objects: Physical Ports and Logical Ports.
Physical Port objects are “owned” by their parent Node
object, be it a switch, HCA, etc. and are one-for-one with
physical ports on the subnet.
Logical Port objects contain 1 or more Physical Port objects
that all share the same GUID value. Logical Port objects are
one-to-many with physical ports on a switch, and one-toone for physical ports on any other type of node.

*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

16
Physical Port Objects
Name: osm_physp_t
Defined in: osm_port.h

Physical Port objects represent physical ports on
the subnet.
A Physical Port object points to its neighbor
Physical Port object on the other end of the wire.
A Physical Port object also contains a pointer to
its parent node object.
These two pointers allow functions to traverse the
subnet object in a manner that mirrors the
topology of the physical subnet.

*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

17
Logical Port Objects
Name: osm_port_t
Defined in: osm_port.h

A Port object represents one or more physical
ports that share the same GUID.
Port objects are “owned” by the Subnet object.
The Port object contains:
– An array of pointers to the corresponding Physical Port
objects.
– The GUID value of this port.
– A pointer to the Node object that owns the ports.

*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

18
Port Object Details
Port Object
Port Object

Physical Port Objects
Physical Port Objects

guid
p_node
num_ports

port_num
port_guid
port_info
p_node
p_remote

Port Objects are logical collections of Physical Port
objects that share the same port GUID. Switches
have many Physical Ports per GUID, while other
nodes have only 1 Physical Port per GUID.
*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

19
Switch Object

Name: osm_switch_t
Defined in: osm_switch.h
A switch object represents a switch on the
subnet.
A switch object is built from other object types:
–
–
–
–

SwitchInfo attribute
Pointer to its parent node object
Switch forwarding table object
Lid Matrix object

*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

20
LID Matrix Object
Name: osm_lid_matrix_t
Defined in: osm_matrix.h

The LID Matrix object contains the number of
hops to any LID value through any port in the
parent switch.
The LID Matrix is structured like a two
dimensional array, with LID values on one axis,
and port numbers on the other.
The LID Matrix expands as new LIDs are added to
the subnet.
The routing algorithms populate the hop count
values for each switch’s matrix.
Constructing a switch’s LID Matrix is a precursor
step to building its forwarding table.
*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

21
Partition Objects
Partition objects model the logical grouping of physical
ports in an IB partition.
Partition objects contain general information about that
partition and a pointer table to port objects in the partition.

Partition N
Partition N

Port
GUID
0x203
0x789
0x922

N
N

Ptr

N
N
N
N
N
N

*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

S
S
N
N
22
Partition Objects (continued)
Pointers to partition objects are stored in a the
partition pointer table indexed by P_KEY value.
Partition 0x123
Partition 0x123

P_KEY
0x123
0x456
0x789

Ptr
Partition 0x456
Partition 0x456

Partition 0x789
Partition 0x789

*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

23
Multicast Groups
A multicast group is identified by its LID in the range 0xC0000xFFFE, which is assigned to all ports in that multicast group.
Multicast group objects look a lot like partition objects: They have
a pointer table to ports in the multicast group.

Multicast Group N
Multicast Group N

Port
GUID
0x123
0x456
0x789

Ptr

N
N

N
N
N
N
N
N

*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

S
S
N
N
24
Multicast Groups (continued)
Pointers to multicast group objects are stored in a
the multicast group object pointer table indexed
by MLID.
Multicast Group 0xC001
Multicast Group 0xC001

MLID
0xC001
0xD0B5
0xE100

Ptr
Multicast Group 0xD0B5
Multicast Group 0xD0B5

Multicast Group 0xE100
Multicast Group 0xE100

*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

25
Path Objects
SM/SA must provide the information in the
PathRecord attribute:
–
–
–
–

Path MTU
Rate
Packet lifetime
Preference value relative to other paths between the two
nodes in question.

There are too many paths in the fabric to
precompute all the path records. In a fabric with
N nodes, the number of paths is:
N(N-1)2 LMC
which grows large quickly!
*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

26
Handling Paths
Path information must be determined on the fly by
the SM/SA.
The SM/SA traverses the subnet object, including
the local copies of the switch forwarding tables,
to accumulate path parameters in a path object.
The path object is discarded once SA has
delivered the information to the requester.

*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

27
Multicast Spanning Trees
Consider a
multicast group
that consists of
nodes A, B, C.
The multicast
spanning tree
for this group is
shown in red.
The spanning
tree covers 3
nodes, 3
switches and 10
Physical ports.

A
AN

N
N

N
N

S
S

N

N
N
R
R

S
S

R
R

S
S
S
S

N
N

N
N
Master

*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

C
C

N
N
N
N
N
N

B
B
28
Multicast LID Routing
Multicast LID routing through a subnet is
configured as a single spanning tree, i.e. loops
are not allowed.
Multicast forwarding tables in switches have
limited size, so the spanning tree for each
multicast group should include only the minimum
number of switches.
Some multicast members can be “send only”
Non members cannot send to the group.

*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

29
Multicast Spanning Trees
A single subnet contains many spanning trees,
one per multicast group.
The subnet object models the physical
interconnections between subnet components,
whereas spanning trees are logical structures.
Normal unicast routes can be considered trivial
spanning trees.
Do spanning trees need first class treatment?
Can they just be implicit somehow?

*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

30
Multicast Group Creation
Start with the set of ports belonging to the group.
Build the optimal spanning tree by traversing the
subnet object.
Convert the spanning tree into forwarding table
entries for the affected switches.
Configure the switches.
The same process works for both unicast and
multicast routing.

*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

31
OpenSM Methods & Algorithms

*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

32
OpenSM Object Methods
Since we’re using ‘C’, we don’t have the concept
of a class with member functions.
The common work-around is to have pointers to
functions in a ‘C’ struct. Unfortunately, these
functions are never made inline by the compiler.
The OpenSM approach will be to use a ‘C’ struct
with separately defined functions (and hence able
to inline) that operate on that structure, as done in
the component library.
This method offers higher performance at the
expense of wordy function names.

*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

33
OpenSM Method Style
Follows the Component Library style.
OpenSM methods use the following form:
osm_<object type>_rest_of_name

For example:
ib_net16_t
osm_port_get_base_lid(
IN const osm_port_t* const p_port );
ib_net16_t
osm_switch_get_base_lid(
IN const osm_switch_t* const p_sw );

*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

34
Paths & Switch Forwarding
Consider the routes
through the subnet
from A to B.
There are two
possible paths:
A–1–4–B
and
A–1–2–3–4–B
Obviously, we’d
like to take
advantage of the
shorter path!

A
AN

N
N

N
N

S
S

N

N
N
R
R

S
S

S
S

2

N
N

3

1

R
R

S
S
4

N
N

N
N
Master

*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

N
N
N
N

B
B
35
IB LID Routing Basics
Leaf nodes are trivial.
Switches with only 1 inter-switch link are also trivial to
handle.
Switches with multiple inter-switch links to the same switch
can have statically load balanced paths.
All the interesting action pertains to switches with interswitch links to more than one other switch.
Switches with a random forwarding table have a “default
route” like in Fibre Channel switching.
Switches with a linear forwarding table do not have a default
route. Thus, a route for every LID in the subnet must be
programmed into the switch’s routing table.

*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

36
OpenSM Subnet Path Strategy
OpenSM hop-optimizes subnet routing.
– Routes to from any two nodes traverse the minimum
number of switches.
– Given a choice of equal hop-count paths, OpenSM
performs static load balancing using a simple weighted
round-robin scheme. Different weights are selected for
1x, 4x and 12x links.

LMC > 0 paths are also distributed using a
weighted round-robin scheme.

*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

37
Handling Path Records
Path information must be determined on the fly by
the SM/SA.
The SM/SA traverses the subnet object, including
the local copies of the switch forwarding tables,
to accumulate path parameters.

*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

38
OpenSM Subnet Configuration Overview
The process is asynchronous. As response
MADs arrive, OpenSM processes them and may
initiate further probing.
OpenSM throttles DR-MAD traffic so as to not
overload the VL15 buffering capability of the
switches.
The various parts of the discovery algorithm are
divided into relatively simple Dispatcher
“controllers”.
Initial discovery and subsequent sweeps share
common code where possible.

*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

39
OpenSM Subnet Configuration
OpenSM performs a multi-stage subnet
configuration:
1. Discovery & subnet inventory
2. LID assignment
3. Configure each switch’s LID Matrix for local
leaf nodes.
4. Configure each switch’s LID Matrix for
neighbor’s leaf nodes.
5. Configure each switch’s LID Matrix for
successively more remote nodes.
6. Build and download switch forwarding tables.
7. Bring ports to ARMED state as necessary.
8. Bring ports to ACTIVE state as necessary.
*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

40
Subnet Discovery & Inventory
Objects representing subnet components are
created on the fly as OpenSM discovers each
component.
The OpenSM architecture is based on a central
asynchronous Dispatcher model.
‘Controllers’ around the Dispatcher perform the
actual processing. The processing performed by
any one Controller is narrow in scope.
The Dispatcher uses its worker threads to call the
processing function of each Controller.

*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

41
Subnet Discovery & Inventory
Subnet Object
Subnet Object

N NR
N NR
N S
N
N S SS N
NS S N
N
R
R
S NN
SN
NN
NN

Serialized
Access
Attribute Rcv
Attribute Rcv
Controller
Controller
Attribute Rcv
Attribute Rcv
Controller
Controller

Attribute Rcv
Attribute Rcv
Controller
Controller

Serialized
Access
Attribute Request
Attribute Request
Controller
Controller

cl_dispatcher
cl_dispatcher
VL15
Outbound
Queue

Assignment
Assignment
Controller
Controller

Assignment
Assignment
Controller
Controller

Inbound SMP
Inbound SMP
Controller
Controller
*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

42
LID Assignment
OpenSM attempts to preserve existing LID
assignments when possible.
Base LID assignments on the subnet may follow
predictable patterns in practice, but OpenSM
treats every base LID as an arbitrary value.
For example:
– Switches are not assigned dedicated LID ranges.
– The base LIDs of leaf nodes on a switch may be
discontiguous

OpenSM assigns consecutive base LIDs when
possible.
– Helps minimizes the number of switch forwarding table
blocks needed to hold all possible routes.
*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

43
Multicast Group Creation
Start with the set of ports belonging to the group.
Build the optimal spanning tree by traversing the
subnet object.
Convert the spanning tree into forwarding table
entries for the affected switches.
Configure the switches.
The same process works for both unicast and
multicast routing.

*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

44
The Discovery Process
Discovery starts by requesting the NodeInfo
attribute of the node at “hop 0” which is the SM’s
own HCA.
The OpenSM object’s “sweep” method initiates
this NodeInfo Request.
Once started at hop 0, the discovery process is a
self-sustaining “chain reaction” the proceeds
until all nodes and ports have been discovered.
The GUID Tables are populated during the
discovery process.

*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

45
Discovery Completion
The discovery process is multi-threaded, thus
nodes may be discovered in different orders from
one sweep to the next.
Discovery is complete when the OpenSM
statistics block indicates there are no outstanding
QP0 MADs. The SM Mad Controller checks for
this condition each time it retires a QP0 MAD to
the MAD pool.
Once all QP0 MADs are off the subnet, the SM
Mad Controller posts a message to the
Dispatcher. This message causes the State
Manager to begin processing the discovered
nodes.

*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

46
LID Configuration
The LID Manager walks the container of
discovered Port objects and compares their
reported LID values with the Subnet Object’s LID
table.
The possible cases are:
– The Port reports the same LID range that OpenSM
believes it owns.
– The Port reports a LID range that overlaps the LID range
owned by another Port.
– The port reports a LID range that OpenSM thought was
unoccupied.
– The port does not yet have a LID range.

*All brand and product names are trademarks or registered trademarks of their respective companies.
Copyright© Intel Corporation 2002, All rights reserved.

47

Contenu connexe

En vedette

Temporalización 1º evaluación 3º eso 2013 2014 (2)
Temporalización 1º evaluación 3º eso 2013 2014 (2)Temporalización 1º evaluación 3º eso 2013 2014 (2)
Temporalización 1º evaluación 3º eso 2013 2014 (2)
Alfonsoooooooooo
 
Tunnel Boring Machine excavation stability
Tunnel Boring Machine excavation stabilityTunnel Boring Machine excavation stability
Tunnel Boring Machine excavation stability
Patrick Mok
 

En vedette (13)

Vipra june15
Vipra june15Vipra june15
Vipra june15
 
عرض خاص
عرض خاصعرض خاص
عرض خاص
 
Använda twitter för omvärldsbevakning
Använda twitter för omvärldsbevakningAnvända twitter för omvärldsbevakning
Använda twitter för omvärldsbevakning
 
1321000252049
13210002520491321000252049
1321000252049
 
Temporalización 1º evaluación 3º eso 2013 2014 (2)
Temporalización 1º evaluación 3º eso 2013 2014 (2)Temporalización 1º evaluación 3º eso 2013 2014 (2)
Temporalización 1º evaluación 3º eso 2013 2014 (2)
 
áNgel que es que es fa
áNgel que es que es faáNgel que es que es fa
áNgel que es que es fa
 
Héroes por un día completo
Héroes por un día completoHéroes por un día completo
Héroes por un día completo
 
how to make a good presentation
  how to make a good presentation  how to make a good presentation
how to make a good presentation
 
Crystal johnson resume17a
Crystal johnson resume17aCrystal johnson resume17a
Crystal johnson resume17a
 
タブレットで何ができるの?~さわってみようタブレット(2015/06/12版)~
タブレットで何ができるの?~さわってみようタブレット(2015/06/12版)~タブレットで何ができるの?~さわってみようタブレット(2015/06/12版)~
タブレットで何ができるの?~さわってみようタブレット(2015/06/12版)~
 
Safety manual
Safety manualSafety manual
Safety manual
 
Master Thesis Data Governance Maturity Model - Jan Merkus MSc
Master Thesis Data Governance Maturity Model - Jan Merkus MScMaster Thesis Data Governance Maturity Model - Jan Merkus MSc
Master Thesis Data Governance Maturity Model - Jan Merkus MSc
 
Tunnel Boring Machine excavation stability
Tunnel Boring Machine excavation stabilityTunnel Boring Machine excavation stability
Tunnel Boring Machine excavation stability
 

Similaire à You SON of a Bitch :P

17294_HiperSockets.pdf
17294_HiperSockets.pdf17294_HiperSockets.pdf
17294_HiperSockets.pdf
Eeszt
 

Similaire à You SON of a Bitch :P (20)

Spirit20090924poly
Spirit20090924polySpirit20090924poly
Spirit20090924poly
 
Userland Hooking in Windows
Userland Hooking in WindowsUserland Hooking in Windows
Userland Hooking in Windows
 
hardware description language power point presentation
hardware description language power point presentationhardware description language power point presentation
hardware description language power point presentation
 
한국통신학회 워크샵: SDN/NFV for Secure Services - Understanding Open Source SDN Contr...
한국통신학회 워크샵: SDN/NFV for Secure Services - Understanding Open Source SDN Contr...한국통신학회 워크샵: SDN/NFV for Secure Services - Understanding Open Source SDN Contr...
한국통신학회 워크샵: SDN/NFV for Secure Services - Understanding Open Source SDN Contr...
 
SDN, OpenFlow, NFV, and Virtual Network
SDN, OpenFlow, NFV, and Virtual NetworkSDN, OpenFlow, NFV, and Virtual Network
SDN, OpenFlow, NFV, and Virtual Network
 
Unlocking the SDN and NFV Transformation
Unlocking the SDN and NFV TransformationUnlocking the SDN and NFV Transformation
Unlocking the SDN and NFV Transformation
 
Introduction ciot workshop premeetup
Introduction ciot workshop premeetupIntroduction ciot workshop premeetup
Introduction ciot workshop premeetup
 
NFF-GO (YANFF) - Yet Another Network Function Framework
NFF-GO (YANFF) - Yet Another Network Function FrameworkNFF-GO (YANFF) - Yet Another Network Function Framework
NFF-GO (YANFF) - Yet Another Network Function Framework
 
Dica ii chapter slides
Dica ii chapter slidesDica ii chapter slides
Dica ii chapter slides
 
The Role of Standards in IoT Security
The Role of Standards in IoT SecurityThe Role of Standards in IoT Security
The Role of Standards in IoT Security
 
17294_HiperSockets.pdf
17294_HiperSockets.pdf17294_HiperSockets.pdf
17294_HiperSockets.pdf
 
Understanding open max il
Understanding open max ilUnderstanding open max il
Understanding open max il
 
OpenStack Neutron behind the Scenes
OpenStack Neutron behind the ScenesOpenStack Neutron behind the Scenes
OpenStack Neutron behind the Scenes
 
OpenStack Neutron Behind The Senes
OpenStack Neutron Behind The SenesOpenStack Neutron Behind The Senes
OpenStack Neutron Behind The Senes
 
TFI2014 Session II - Requirements for SDN - Brian Field
TFI2014 Session II - Requirements for SDN - Brian FieldTFI2014 Session II - Requirements for SDN - Brian Field
TFI2014 Session II - Requirements for SDN - Brian Field
 
Hunting rootkits with windbg
Hunting rootkits with windbgHunting rootkits with windbg
Hunting rootkits with windbg
 
Corba model ppt
Corba model pptCorba model ppt
Corba model ppt
 
From gcc to the autotools
From gcc to the autotoolsFrom gcc to the autotools
From gcc to the autotools
 
IoTivity: From Devices to the Cloud
IoTivity: From Devices to the CloudIoTivity: From Devices to the Cloud
IoTivity: From Devices to the Cloud
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 

Dernier

Dernier (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

You SON of a Bitch :P

  • 1. OpenSM Design Overview August 2002 *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 1
  • 2. Infiniband Management The Infiniband architecture requires a “subnet manager” node to configure and maintain an Infiniband fabric. The subnet manager actually consists of several modules. The term “subnet manager” is commonly used to refer to the node that provides both a Subnet Manager (SM) and the Subnet Administrator (SA). The SM configures subnet addresses and switch routing tables. The SA serves as the subnet’s database, hosting queries from other subnet nodes. *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 2
  • 3. OpenSM Features Open source Object oriented, extensible design implemented in C. Embedded documentation with Robodoc. Designed for portability to other platforms and Infiniband interfaces. Current implementation runs in Linux userspace on top of the open source AL API. Supports major SM features: multipathing (LMC > 0), partitions, multicast groups, SM fail-over. Guarantees optimal routing between any two end-points regardless of subnet topology. Supports commonly used SA queries. Complete logging facilities to the MAD frame level. *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 3
  • 4. Unsupported Features The current developers do not plan to implement the following features. These capabilities could be added by other interested developers. – – – – – Switches with random forwarding tables. Subnet routers. Complete set of SA queries. Virtual lanes. GUI *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 4
  • 5. The OpenSM Object Model *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 5
  • 6. The OpenSM Object Model OpenSM models most of its internal data structures on the physical components in an IB fabric, such as nodes, switches, partitions, etc. OpenSM objects are linked together to model the physical relationships on the IB subnet. For example, a switch object contains port objects. Port objects point to other port objects. Most OpenSM objects provide functions that hide the details of the object’s implementation. *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 6
  • 7. The Component Library The Component Library is a portable, open source library of useful functions and “containers” implemented in C. The Component Library provides lists, balanced trees, variable sized arrays, etc. OpenSM makes heavy use of the component library. In the source, you will notice use of functions starting with: cl_. These are calls to the Component Library. When in doubt, refer to the component library documentation! *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 7
  • 8. OpenSM Objects: A Top Down View The “highest order” OpenSM object is the OpenSM object itself. The OpenSM object spans exactly one IB subnet. The OpenSM object is built from a variety of other objects: – – – – One Subnet object One SM object One SA object A Log object & many other supporting objects The following slides will examine each of these components in detail. *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 8
  • 9. The Subnet Object Name: osm_subn_t Defined in: osm_subnet.h The subnet object models the physical composition of the subnet. OpenSM builds the subnet object during discovery and maintains it during subsequent sweeps. The subnet object is built from many other object types, for example: – – – – – Node Objects – represent physical nodes Switch Objects – represent physical switches Router Objects – represent physical routers Partition Objects – represent logical partitions Multicast Group Objects – represent multicast groups. *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 9
  • 10. The Subnet Object N N R R S S S S S S R R S S N N N N Master N N Node S N N N N N N N Switch R Router Port / Link N N N N This view shows the internal object-to-object linkage that models the physical subnet. *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 10
  • 11. Subnet Pointer Tables The subnet object contains tables of pointers of each type of object in the subnet. The pointer tables provide quick access to objects of a particular type, be it a switch, node port, partition, etc. Where appropriate, two pointer tables are provided for each type of object: – LidTable – pointers sorted by unicast LID – GuidTable – pointers sorted by GUID The pointer tables sorted by GUID are implemented using cl_map (red-black trees). The pointer tables sorted by LID are implemented using cl_ptr_vector (variable size array of pointers). *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 11
  • 12. Pointer Table Example - Nodes Lid 1 2 3 … Ptr N N N N … N N GUID 0x123 0x456 0x789 … N N Ptr S S S S S S R R … R R S S N N N N N N N N N N Master *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 12
  • 13. How does LMC > 0 look? Lid 1 2 3 … GUID 0x123 0x456 0x789 … Ptr N N N N N N … Ptr N N S S S S S S R R … R R S S N N N N N N N N N N Master LMC > 0 is not a special case! *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 13
  • 14. Node Object Name: osm_node_t Defined in: osm_node.h A node object represents a node on the subnet. A node may be a Host Channel Adapter (HCA), a Target Channel Adapter (TCA), a Switch or a Router. A node object is built from other object types: – An array of Physical Port objects – NodeInfo attribute – NodeDescription attribute Nodes “own” their corresponding Physical Port objects. All other references to a Physical Port point to a Physical Port inside a Node. *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 14
  • 15. Node Objects Node Object Node Object Node Object Node Object NodeInfo Attribute NodeInfo Attribute NodeDescription Attribute NodeDescription Attribute Physical Port 0 Physical Port 1 … Physical Port N Physical Port 0 Physical Port 1 … Physical Port N Node Objects are linked to their neighbors though their Physical Port Objects. *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 15
  • 16. OpenSM Port Object Model The object model for ports is somewhat complicated. In switches, all ports share a common port GUID value, and are thus logically grouped together. In all other node types, ports have a unique GUID value. The OpenSM object model handles this by creating two types of port objects: Physical Ports and Logical Ports. Physical Port objects are “owned” by their parent Node object, be it a switch, HCA, etc. and are one-for-one with physical ports on the subnet. Logical Port objects contain 1 or more Physical Port objects that all share the same GUID value. Logical Port objects are one-to-many with physical ports on a switch, and one-toone for physical ports on any other type of node. *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 16
  • 17. Physical Port Objects Name: osm_physp_t Defined in: osm_port.h Physical Port objects represent physical ports on the subnet. A Physical Port object points to its neighbor Physical Port object on the other end of the wire. A Physical Port object also contains a pointer to its parent node object. These two pointers allow functions to traverse the subnet object in a manner that mirrors the topology of the physical subnet. *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 17
  • 18. Logical Port Objects Name: osm_port_t Defined in: osm_port.h A Port object represents one or more physical ports that share the same GUID. Port objects are “owned” by the Subnet object. The Port object contains: – An array of pointers to the corresponding Physical Port objects. – The GUID value of this port. – A pointer to the Node object that owns the ports. *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 18
  • 19. Port Object Details Port Object Port Object Physical Port Objects Physical Port Objects guid p_node num_ports port_num port_guid port_info p_node p_remote Port Objects are logical collections of Physical Port objects that share the same port GUID. Switches have many Physical Ports per GUID, while other nodes have only 1 Physical Port per GUID. *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 19
  • 20. Switch Object Name: osm_switch_t Defined in: osm_switch.h A switch object represents a switch on the subnet. A switch object is built from other object types: – – – – SwitchInfo attribute Pointer to its parent node object Switch forwarding table object Lid Matrix object *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 20
  • 21. LID Matrix Object Name: osm_lid_matrix_t Defined in: osm_matrix.h The LID Matrix object contains the number of hops to any LID value through any port in the parent switch. The LID Matrix is structured like a two dimensional array, with LID values on one axis, and port numbers on the other. The LID Matrix expands as new LIDs are added to the subnet. The routing algorithms populate the hop count values for each switch’s matrix. Constructing a switch’s LID Matrix is a precursor step to building its forwarding table. *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 21
  • 22. Partition Objects Partition objects model the logical grouping of physical ports in an IB partition. Partition objects contain general information about that partition and a pointer table to port objects in the partition. Partition N Partition N Port GUID 0x203 0x789 0x922 N N Ptr N N N N N N *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. S S N N 22
  • 23. Partition Objects (continued) Pointers to partition objects are stored in a the partition pointer table indexed by P_KEY value. Partition 0x123 Partition 0x123 P_KEY 0x123 0x456 0x789 Ptr Partition 0x456 Partition 0x456 Partition 0x789 Partition 0x789 *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 23
  • 24. Multicast Groups A multicast group is identified by its LID in the range 0xC0000xFFFE, which is assigned to all ports in that multicast group. Multicast group objects look a lot like partition objects: They have a pointer table to ports in the multicast group. Multicast Group N Multicast Group N Port GUID 0x123 0x456 0x789 Ptr N N N N N N N N *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. S S N N 24
  • 25. Multicast Groups (continued) Pointers to multicast group objects are stored in a the multicast group object pointer table indexed by MLID. Multicast Group 0xC001 Multicast Group 0xC001 MLID 0xC001 0xD0B5 0xE100 Ptr Multicast Group 0xD0B5 Multicast Group 0xD0B5 Multicast Group 0xE100 Multicast Group 0xE100 *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 25
  • 26. Path Objects SM/SA must provide the information in the PathRecord attribute: – – – – Path MTU Rate Packet lifetime Preference value relative to other paths between the two nodes in question. There are too many paths in the fabric to precompute all the path records. In a fabric with N nodes, the number of paths is: N(N-1)2 LMC which grows large quickly! *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 26
  • 27. Handling Paths Path information must be determined on the fly by the SM/SA. The SM/SA traverses the subnet object, including the local copies of the switch forwarding tables, to accumulate path parameters in a path object. The path object is discarded once SA has delivered the information to the requester. *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 27
  • 28. Multicast Spanning Trees Consider a multicast group that consists of nodes A, B, C. The multicast spanning tree for this group is shown in red. The spanning tree covers 3 nodes, 3 switches and 10 Physical ports. A AN N N N N S S N N N R R S S R R S S S S N N N N Master *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. C C N N N N N N B B 28
  • 29. Multicast LID Routing Multicast LID routing through a subnet is configured as a single spanning tree, i.e. loops are not allowed. Multicast forwarding tables in switches have limited size, so the spanning tree for each multicast group should include only the minimum number of switches. Some multicast members can be “send only” Non members cannot send to the group. *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 29
  • 30. Multicast Spanning Trees A single subnet contains many spanning trees, one per multicast group. The subnet object models the physical interconnections between subnet components, whereas spanning trees are logical structures. Normal unicast routes can be considered trivial spanning trees. Do spanning trees need first class treatment? Can they just be implicit somehow? *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 30
  • 31. Multicast Group Creation Start with the set of ports belonging to the group. Build the optimal spanning tree by traversing the subnet object. Convert the spanning tree into forwarding table entries for the affected switches. Configure the switches. The same process works for both unicast and multicast routing. *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 31
  • 32. OpenSM Methods & Algorithms *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 32
  • 33. OpenSM Object Methods Since we’re using ‘C’, we don’t have the concept of a class with member functions. The common work-around is to have pointers to functions in a ‘C’ struct. Unfortunately, these functions are never made inline by the compiler. The OpenSM approach will be to use a ‘C’ struct with separately defined functions (and hence able to inline) that operate on that structure, as done in the component library. This method offers higher performance at the expense of wordy function names. *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 33
  • 34. OpenSM Method Style Follows the Component Library style. OpenSM methods use the following form: osm_<object type>_rest_of_name For example: ib_net16_t osm_port_get_base_lid( IN const osm_port_t* const p_port ); ib_net16_t osm_switch_get_base_lid( IN const osm_switch_t* const p_sw ); *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 34
  • 35. Paths & Switch Forwarding Consider the routes through the subnet from A to B. There are two possible paths: A–1–4–B and A–1–2–3–4–B Obviously, we’d like to take advantage of the shorter path! A AN N N N N S S N N N R R S S S S 2 N N 3 1 R R S S 4 N N N N Master *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. N N N N B B 35
  • 36. IB LID Routing Basics Leaf nodes are trivial. Switches with only 1 inter-switch link are also trivial to handle. Switches with multiple inter-switch links to the same switch can have statically load balanced paths. All the interesting action pertains to switches with interswitch links to more than one other switch. Switches with a random forwarding table have a “default route” like in Fibre Channel switching. Switches with a linear forwarding table do not have a default route. Thus, a route for every LID in the subnet must be programmed into the switch’s routing table. *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 36
  • 37. OpenSM Subnet Path Strategy OpenSM hop-optimizes subnet routing. – Routes to from any two nodes traverse the minimum number of switches. – Given a choice of equal hop-count paths, OpenSM performs static load balancing using a simple weighted round-robin scheme. Different weights are selected for 1x, 4x and 12x links. LMC > 0 paths are also distributed using a weighted round-robin scheme. *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 37
  • 38. Handling Path Records Path information must be determined on the fly by the SM/SA. The SM/SA traverses the subnet object, including the local copies of the switch forwarding tables, to accumulate path parameters. *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 38
  • 39. OpenSM Subnet Configuration Overview The process is asynchronous. As response MADs arrive, OpenSM processes them and may initiate further probing. OpenSM throttles DR-MAD traffic so as to not overload the VL15 buffering capability of the switches. The various parts of the discovery algorithm are divided into relatively simple Dispatcher “controllers”. Initial discovery and subsequent sweeps share common code where possible. *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 39
  • 40. OpenSM Subnet Configuration OpenSM performs a multi-stage subnet configuration: 1. Discovery & subnet inventory 2. LID assignment 3. Configure each switch’s LID Matrix for local leaf nodes. 4. Configure each switch’s LID Matrix for neighbor’s leaf nodes. 5. Configure each switch’s LID Matrix for successively more remote nodes. 6. Build and download switch forwarding tables. 7. Bring ports to ARMED state as necessary. 8. Bring ports to ACTIVE state as necessary. *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 40
  • 41. Subnet Discovery & Inventory Objects representing subnet components are created on the fly as OpenSM discovers each component. The OpenSM architecture is based on a central asynchronous Dispatcher model. ‘Controllers’ around the Dispatcher perform the actual processing. The processing performed by any one Controller is narrow in scope. The Dispatcher uses its worker threads to call the processing function of each Controller. *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 41
  • 42. Subnet Discovery & Inventory Subnet Object Subnet Object N NR N NR N S N N S SS N NS S N N R R S NN SN NN NN Serialized Access Attribute Rcv Attribute Rcv Controller Controller Attribute Rcv Attribute Rcv Controller Controller Attribute Rcv Attribute Rcv Controller Controller Serialized Access Attribute Request Attribute Request Controller Controller cl_dispatcher cl_dispatcher VL15 Outbound Queue Assignment Assignment Controller Controller Assignment Assignment Controller Controller Inbound SMP Inbound SMP Controller Controller *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 42
  • 43. LID Assignment OpenSM attempts to preserve existing LID assignments when possible. Base LID assignments on the subnet may follow predictable patterns in practice, but OpenSM treats every base LID as an arbitrary value. For example: – Switches are not assigned dedicated LID ranges. – The base LIDs of leaf nodes on a switch may be discontiguous OpenSM assigns consecutive base LIDs when possible. – Helps minimizes the number of switch forwarding table blocks needed to hold all possible routes. *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 43
  • 44. Multicast Group Creation Start with the set of ports belonging to the group. Build the optimal spanning tree by traversing the subnet object. Convert the spanning tree into forwarding table entries for the affected switches. Configure the switches. The same process works for both unicast and multicast routing. *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 44
  • 45. The Discovery Process Discovery starts by requesting the NodeInfo attribute of the node at “hop 0” which is the SM’s own HCA. The OpenSM object’s “sweep” method initiates this NodeInfo Request. Once started at hop 0, the discovery process is a self-sustaining “chain reaction” the proceeds until all nodes and ports have been discovered. The GUID Tables are populated during the discovery process. *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 45
  • 46. Discovery Completion The discovery process is multi-threaded, thus nodes may be discovered in different orders from one sweep to the next. Discovery is complete when the OpenSM statistics block indicates there are no outstanding QP0 MADs. The SM Mad Controller checks for this condition each time it retires a QP0 MAD to the MAD pool. Once all QP0 MADs are off the subnet, the SM Mad Controller posts a message to the Dispatcher. This message causes the State Manager to begin processing the discovered nodes. *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 46
  • 47. LID Configuration The LID Manager walks the container of discovered Port objects and compares their reported LID values with the Subnet Object’s LID table. The possible cases are: – The Port reports the same LID range that OpenSM believes it owns. – The Port reports a LID range that overlaps the LID range owned by another Port. – The port reports a LID range that OpenSM thought was unoccupied. – The port does not yet have a LID range. *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. 47