MULTIMEDIA COMMUNICATION & NETWORKS

13PIT101
Multimedia Communication & Networks

UNIT – II

Dr.A.Kathirvel
Professor & Head/IT - VCEW

Unit - II
Intra AS routing – Inter AS routing – Router
Architecture – Switch Fabric – Active Queue
Management – Head of Line blocking –
Transition from IPv4 to IPv6 – Multicasting –
Abstraction of Multicast groups – Group
Management – IGMP – Group Shared
Multicast Tree – Source based Multicast Tree –
Multicast routing in Internet – DVMRP and
MOSPF – PIM – Sparse mode and Dense
mode

The Internet Network layer
Host, router network layer functions:
Transport layer: TCP, UDP

Network
layer

IP protocol
•addressing conventions
•datagram format
•packet handling conventions

Routing protocols
•path selection
•RIP, OSPF, BGP

routing
table

ICMP protocol
•error reporting
•router “signaling”

Link layer
physical layer

#4

Hierarchical Routing
Our routing study thus far - idealization
 all routers identical
 network “flat”
… not true in practice
scale: with 50 million
destinations:
• can’t store all dest’s in routing
tables!
• routing table exchange would
swamp links!

administrative autonomy
• internet = network of networks
• each network admin may want to
control routing in its own network

#5

Hierarchical Routing
• aggregate routers into
regions, “autonomous
systems” (AS)
• routers in same AS run
same routing protocol
– “intra-AS” routing
protocol
– routers in different AS
can run different intraAS routing protocol

gateway routers
• special routers in AS
• run intra-AS routing
protocol with all other
routers in AS
• also responsible for routing
to destinations outside AS
– run inter-AS routing
protocol with other
gateway routers

#6

Intra-AS and Inter-AS routing
C.b

Gateways:

B.a
A.a

a

b

A.c

C

A

b

a
B

a
d

c

c

b

•perform inter-AS
routing amongst
themselves
•perform intra-AS
routers with other
routers in their AS

network layer
inter-AS, intra-AS routing
in
gateway A.c

link layer
physical layer

#7

Intra-AS and Inter-AS routing
C.b
A.a
a

Host
h1

b

Inter-AS
routing
between
A and B
A.c

C

a
d

B.a

c

a
B

c

b
A
Intra-AS routing
within AS A

Host
h2
b

Intra-AS routing
within AS B

 We’ll examine specific inter-AS and intra-AS Internet

routing protocols shortly
#8

Routing: Example
E

AS D
d

AS A
(OSPF)

a2

F

No Export
to F

a1

i

AS C

AS B
(OSPF intra
routing)

i2
b

AS I
#9

Routing: Example
E

AS D
d

d1
d2

AS A
(OSPF)

a2
i
F

AS C

a1
How to specify?

AS B
(OSPF intra
routing)

b
AS I

#10

IP Addressing Scheme
• We need an address to uniquely identify each
destination
• Routing scalability needs flexibility in
aggregation of destination addresses
– we should be able to aggregate a set of
destinations as a single routing unit

• Preview: the unit of routing in the Internet is
a network---the destinations in the routing
protocols are networks
#11

IP Addressing: introduction
• IP address: 32-bit
identifier for host, router
interface
• interface: connection
between host, router and
physical link
– router’s typically have
multiple interfaces
– host may have multiple
interfaces
– IP addresses associated
with interface, not host, or
router

223.1.1.1

223.1.2.1
223.1.1.2
223.1.1.4
223.1.1.3

223.1.2.9

223.1.3.27

223.1.2.2

223.1.3.2

223.1.3.1

223.1.1.1 = 11011111 00000001 00000001 00000001
223

1

1

1
#12

IP Addressing
• IP address:
– network part
• high order bits

– host part
• low order bits

223.1.1.1
223.1.1.2
223.1.1.4
223.1.1.3

• What’s a network ? (from
IP address perspective)
– device interfaces with
same network part of IP
address
– can physically reach each
other without intervening
router

223.1.2.1
223.1.2.9

223.1.3.27

223.1.2.2

LAN
223.1.3.1

223.1.3.2

network consisting of 3 IP networks
(for IP addresses starting with 223,
first 24 bits are network address)

#13

IP Addressing
How to find the networks?
• Detach each interface
from router, host
• create “islands of isolated
networks

223.1.1.2

223.1.1.1

223.1.1.4

223.1.1.3

223.1.9.2

223.1.7.0

223.1.9.1

223.1.7.1
223.1.8.1

223.1.8.0

223.1.2.6

Interconnected
system consisting
of six networks

223.1.2.1

223.1.3.27

223.1.2.2

223.1.3.1

223.1.3.2

#14

IP Addresses
given notion of “network”, let’s re-examine IP addresses:
“class-full” addressing:
class
A

0 network

B

10

C

110

D

1110

1.0.0.0 to
127.255.255.255

host

network

128.0.0.0 to
191.255.255.255

host

network
multicast address

host

192.0.0.0 to
223.255.255.255
224.0.0.0 to
239.255.255.255

32 bits

#15

IP addressing: CIDR
• classful addressing:

– inefficient use of address space, address space exhaustion
– e.g., class B net allocated enough addresses for 65K hosts,
even if only 2K hosts in that network

• CIDR: Classless InterDomain Routing
– network portion of address of arbitrary length
– address format: a.b.c.d/x, where x is # bits in network
portion of address
network
part

host
part

11001000 00010111 00010000 00000000
200.23.16.0/23
#16

CIDR Address Aggregation dAS D

d1

AS A
(OSPF)

a2

130.132.1/24
i

i->a1: I can reach
130.132/16; my path:
I
a1

130.132.2/24

AS I

intradomain routing
uses /24

130.132.3/24
#17

CIDR Address Aggregation
B
x00/24: B

A

x01/24: C

C

x10/24: E
E
x11/24: F
G
F
#18

IP addresses: how to get one?
Hosts (host portion):
• hard-coded by system admin in a file
• DHCP: Dynamic Host Configuration Protocol:
dynamically get address: “plug-and-play”

–
–
–
–
–

host broadcasts “DHCP discover” msg
DHCP server responds with “DHCP offer” msg
host requests IP address: “DHCP request” msg
DHCP server sends address: “DHCP ack” msg
The common practice in LAN and home access (why?)

#19

IP addresses: how to get one?
Network (network portion):
• get allocated portion of ISP’s address space:
ISP's block

11001000 00010111 00010000 00000000 200.23.16.0/20

Organization 0 11001000 00010111 00010000 00000000 200.23.16.0/23
Organization 1 11001000 00010111 00010010 00000000 200.23.18.0/23
Organization 2 11001000 00010111 00010100 00000000 200.23.20.0/23
...
…..
….
….
Organization 7 11001000 00010111 00011110 00000000 200.23.30.0/23

#20

Hierarchical addressing: route aggregation
Hierarchical addressing allows efficient advertisement of routing
information:

Organization 0

200.23.16.0/23
Organization 1

200.23.18.0/23
Organization 2

200.23.20.0/23
Organization 7

.
.
.

.
.
.

Fly-By-Night-ISP

“Send me anything
with addresses
beginning
200.23.16.0/20”

Internet

200.23.30.0/23
ISPs-R-Us

“Send me anything
with addresses
beginning
199.31.0.0/16”
#21

Hierarchical addressing: more specific routes
ISPs-R-Us has a more specific route to Organization 1
Organization 0

200.23.16.0/23

Organization 2

200.23.20.0/23
Organization 7

.
.
.

.
.
.

Fly-By-Night-ISP

“Send me anything
with addresses
beginning
200.23.16.0/20”

Internet

200.23.30.0/23
ISPs-R-Us
Organization 1

200.23.18.0/23

“Send me anything
with addresses
beginning 199.31.0.0/16
or 200.23.18.0/23”

#22

Network Address Translation: Motivation
 A local network uses just one public IP address as far as outside world is

concerned
 Each device on the local network is assigned a private IP address
rest of
Internet

local network
(e.g., home network)
192.168.1.0/24
192.168.1.1

192.168.1.2

192.168.1.3

138.76.29.7
192.168.1.4

All datagrams leaving local
network have same single source NAT IP
address: 138.76.29.7,
different source port numbers

Datagrams with source or
destination in this network
have 192.168.1/24 address for
source, destination (as usual)
#23

NAT: Network Address Translation
Implementation: NAT router must:

– outgoing datagrams: replace (source IP address, port #)
of every outgoing datagram to (NAT IP address, new port
#)
. . . remote clients/servers will respond using (NAT IP address,
new port #) as destination addr.

– remember (in NAT translation table) every (source IP
address, port #) to (NAT IP address, new port #)
translation pair
– incoming datagrams: replace (NAT IP address, new port
#) in dest fields of every incoming datagram with
corresponding (source IP address, port #) stored in NAT
table
#24

2: NAT router
changes datagram
source addr from
192.168.1.2, 3345 to
138.76.29.7, 5001,
updates table

NAT translation table
WAN side addr
LAN side addr

1: host 192.168.1.2
sends datagram to
128.119.40.186, 80

138.76.29.7, 5001 192.168.1.2, 3345
……
……
S: 192.168.1.2, 3345
D: 128.119.40.186, 80

192.168.1.2

1
2

S: 138.76.29.7, 5001
D: 128.119.40.186, 80

192.168.1.1
192.168.1.3

138.76.29.7
S: 128.119.40.186, 80
D: 138.76.29.7, 5001

3: Reply arrives
dest. address:
138.76.29.7, 5001

3

S: 128.119.40.186, 80
D: 192.168.1.2, 3345

4
192.168.1.4

4: NAT router
changes datagram
dest addr from
138.76.29.7, 5001 to 192.168.1.2, 3345

#25

Network Address Translation: Advantages
• No need to be allocated range of addresses from
ISP: - just one public IP address is used for all
devices
– 16-bit port-number field allows 60,000 simultaneous
connections with a single LAN-side address !
– can change ISP without changing addresses of
devices in local network
– can change addresses of devices in local network
without notifying outside world

• Devices inside local net not explicitly
addressable, visible by outside world (a security
plus)

#26


• If both hosts are behind different NAT, they
will have difficulty establishing connection
• NAT is controversial:

– routers should process up to only layer 3
– violates end-to-end argument

• NAT possibility must be taken into account by app
designers, e.g., P2P applications

– address shortage should instead be solved by
having more addresses --- IPv6
#27

IP addressing: the last word...

Q: How does an ISP get block of addresses?
A: ICANN: Internet Corporation for Assigned
Names and Numbers

– allocates addresses
– manages DNS
– assigns domain names, resolves disputes

#28

Getting a datagram from source to dest.
routing table in A
Dest. Net. next router Nhops
223.1.1
223.1.2
223.1.3

IP datagram:
misc
fields

source
IP addr

dest
IP addr

data

A

 datagram remains unchanged,

as it travels source to
destination
 addr fields of interest here
 mainly dest. IP addr

223.1.1.4
223.1.1.4

1
2
2

223.1.1.1

223.1.1.2
223.1.1.4

223.1.2.1
223.1.2.9

B
223.1.1.3
223.1.3.1

223.1.3.27

223.1.2.2

E

223.1.3.2

#29


misc
data
fields 223.1.1.1 223.1.1.3

223.1.1
223.1.2
223.1.3

Starting at A, given IP datagram
addressed to B:
 look up net. address of B

 find B is on same net. as A

A

 link layer will send datagram directly

to B inside link-layer frame
 B and A are directly connected

223.1.1.4
223.1.1.4

1
2
2

223.1.1.1

223.1.1.2
223.1.1.4

223.1.2.1
223.1.2.9

B
223.1.1.3
223.1.3.1

223.1.3.27

223.1.2.2

E

223.1.3.2

#30

misc
fields 223.1.1.1 223.1.2.2


data

223.1.1
223.1.2
223.1.3

Starting at A, dest. E:

 look up network address of E
 E on different network







A, E not directly attached
routing table: next hop router to E
is 223.1.1.4
link layer sends datagram to router
223.1.1.4 inside link-layer frame
datagram arrives at 223.1.1.4
continued…..

A

223.1.1.4
223.1.1.4

1
2
2

223.1.1.1

223.1.1.2
223.1.1.4

223.1.2.1
223.1.2.9

B
223.1.1.3
223.1.3.1

223.1.3.27

223.1.2.2

E

223.1.3.2

#31

misc
fields 223.1.1.1 223.1.2.2

Dest. next
network router Nhops interface

data

Arriving at 223.1.4, destined for
223.1.2.2
 look up network address of E

 E on same network as router’s

interface 223.1.2.9
 router, E directly attached
 link layer sends datagram to
223.1.2.2 inside link-layer frame
via interface 223.1.2.9
 datagram arrives at 223.1.2.2!!!
(hooray!)

223.1.1
223.1.2
223.1.3
A

-

1
1
1

223.1.1.4
223.1.2.9
223.1.3.27

223.1.1.1

223.1.1.2
223.1.1.4

223.1.2.1
223.1.2.9

B
223.1.1.3
223.1.3.1

223.1.3.27

223.1.2.2

E

223.1.3.2

#32

IP datagram format
IP protocol version
number
header length
(bytes)
“type” of data
max number
remaining hops
(decremented at
each router)
upper layer protocol
to deliver payload to

32 bits
head. type of
T
len service
fragment
16-bit identifier flgs
offset
upper
time to
Internet
layer
live
checksum

ver

total datagram
length (bytes)

for
fragmentation/
reassembly

32 bit source IP address
32 bit destination IP address
Options (if any)

data
(variable length,
typically a TCP
or UDP segment)

E.g. timestamp,
record route
taken, specify
list of routers
to visit.

#33

IP Fragmentation & Reassembly
•

•

network links have MTU
(max.transfer size) - largest
possible link-level frame.
– different link types, different
MTUs
large IP datagram divided
(“fragmented”) within net
– one datagram becomes several
datagrams
– “reassembled” only at final
destination
– IP header bits used to identify,
order related fragments

fragmentation:
in: one large datagram
out: 3 smaller datagrams

reassembly

4-34

IP Fragmentation and Reassembly
Example
 4000 byte datagram
 MTU = 1500 bytes

length ID fragflag
=4000 =x
=0

offset
=0

One large datagram becomes
several smaller datagrams
length ID fragflag
=1500 =x
=1
length ID fragflag
=1500 =x
=1

1480 bytes in
data field
offset =
1480/8

offset
=0
offset
=185

length ID fragflag
=1060 =x
=0

offset
=370

Network Layer

4-35

Routing in the Internet
• The Global Internet consists of Autonomous Systems (AS)
interconnected with each other:
– Stub AS: small corporation
– Multihomed AS: large corporation (no transit)
– Transit AS: provider

• Two-level routing:
– Intra-AS: administrator is responsible for choice
– Inter-AS: unique standard

Lecture 6: Network Layer

#36

Internet AS Hierarchy
Inter-AS border (exterior gateway) routers

Intra-AS interior (gateway) routers

#37

Intra-AS Routing
• Also known as Interior Gateway Protocols (IGP)
• Most common IGPs:

– RIP: Routing Information Protocol
– OSPF: Open Shortest Path First
– IGRP: Interior Gateway Routing Protocol (Cisco
propr.)


#38

RIP ( Routing Information Protocol)
• Distance vector algorithm
• Included in BSD-UNIX Distribution in 1982
• Distance metric: # of hops (max = 15 hops)
– why?

• Distance vectors: exchanged every 30 sec via Response
Message (also called advertisement)
• Each advertisement: route to up to 25 destination nets


#39

RIP (Routing Information Protocol)
z

w

x

A

y
D

B

C
Destination Network

w
y
z
x

….

Next Router

Num. of hops to dest.

A
B
B
--

….

2
2
7
1
....

Routing table in D

#40

RIP: Link Failure and Recovery
If no advertisement heard after 180 sec --> neighbor/link declared
dead

– routes via neighbor invalidated
– new advertisements sent to neighbors
– neighbors in turn send out new advertisements (if
tables changed)
– link failure info quickly propagates to entire net
– poison reverse used to prevent ping-pong loops
(infinite distance = 16 hops)


#41

OSPF (Open Shortest Path First)
• “open”: publicly available
• Uses Link State algorithm
– LS packet dissemination
– Topology map at each node
– Route computation using Dijkstra’s algorithm

• OSPF advertisement carries one entry per neighbor router
• Advertisements disseminated to entire AS (via flooding)


#42

OSPF “advanced” features (not in RIP)
• Security: all OSPF messages authenticated (to prevent
malicious intrusion); TCP connections used
• Multiple same-cost paths allowed
– only one path in RIP

• For each link, multiple cost metrics for different ToS (eg,
satellite link cost set “low” for best effort; high for real time)
• Integrated uni- and multicast support:
– Multicast OSPF (MOSPF) uses same topology data base as OSPF

• Hierarchical OSPF in large domains.


#43

Hierarchical OSPF


#44

Hierarchical OSPF
• Two-level hierarchy: local area, backbone.

– Link-state advertisements only in area
– each nodes has detailed area topology; only know
direction (shortest path) to nets in other areas.
• Area border routers: “summarize” distances to nets in own
area, advertise to other Area Border routers.
• Backbone routers: run OSPF routing limited to backbone.
• Boundary routers: connect to other ASs.


#45

IGRP (Interior Gateway Routing Protocol)
•
•
•
•
•

CISCO proprietary; successor of RIP (mid 80s)
Distance Vector, like RIP
several cost metrics (delay, bandwidth, reliability, load etc)
uses TCP to exchange routing updates
Loop-free routing via Distributed Updating Alg. (DUAL)
based on diffused computation


#46

Inter-AS routing


#47

Internet inter-AS routing: BGP
• BGP (Border Gateway Protocol): the de facto standard
• Path Vector protocol:

– similar to Distance Vector protocol
– each Border Gateway broadcast to neighbors
(peers) entire path (I.e, sequence of ASs) to
destination
– E.g., Gateway X may send its path to dest. Z:
Path (X,Z) = X,Y1,Y2,Y3,…,Z

#48

Suppose: gateway X send its path to peer gateway W
• W may or may not select path offered by X
– cost, policy (don’t route via competitors AS), loop prevention
reasons.
• If W selects path advertised by X, then:
Path (W,Z) = W, Path (X,Z)
• Note: X can control incoming traffic by controlling its route
advertisements to peers:
– e.g., don’t want to route traffic to Z -> don’t advertise any routes
to Z


#49

• BGP messages exchanged using TCP.
• BGP messages:

– OPEN: opens TCP connection to peer and
authenticates sender
– UPDATE: advertises new path (or withdraws old)
– KEEPALIVE keeps connection alive in absence of
UPDATES; also ACKs OPEN request
– NOTIFICATION: reports errors in previous msg;
also used to close connection

#50

Why different Intra- and Inter-AS routing ?

Policy:
• Inter-AS: admin wants control over how its traffic routed, who
routes through its net.
• Intra-AS: single admin, so no policy decisions needed

Scale:
• hierarchical routing saves table size, reduced update traffic
Performance:
• Intra-AS: can focus on performance
• Inter-AS: policy may dominate over performance


#51

Extra


#52

ICMP: Internet Control Message Protocol
•

•

•

used by hosts & routers to
communicate network-level
information
– error reporting: unreachable host,
network, port, protocol
– echo request/reply (used by ping)
network-layer “above” IP:
– ICMP msgs carried in IP
datagrams
ICMP message: type, code plus first 8
bytes of IP datagram causing error

Type
0
3
3
3
3
3
3
4

Code
0
0
1
2
3
6
7
0

8
9
10
11
12

0
0
0
0
0

Network Layer

description
echo reply (ping)
dest. network unreachable
dest host unreachable
dest protocol unreachable
dest port unreachable
dest network unknown
dest host unknown
source quench (congestion
control - not used)
echo request (ping)
route advertisement
router discovery
TTL expired
bad IP header

4-53

Traceroute and ICMP
• Source sends series of UDP
segments to dest
– First has TTL =1
– Second has TTL=2, etc.
– Unlikely port number
• When nth datagram arrives to nth
router:
– Router discards datagram
– And sends to source an ICMP
message (type 11, code 0)
– Message includes name of
router& IP address

• When ICMP message arrives,
source calculates RTT
• Traceroute does this 3 times
Stopping criterion
• UDP segment eventually arrives
at destination host
• Destination returns ICMP “dest
port unreachable” packet (type 3,
code 3)
• When source gets this ICMP,
stops.

Network Layer

4-54

Example: tracert www.yahoo.com
Tracing route to www-real.wa1.b.yahoo.com [69.147.76.15]
over a maximum of 30 hops:
1
2
3
4
5
6
7
8
9
10
11

<1 ms <1 ms <1 ms 132.67.250.1
<1 ms 1 ms <1 ms dmz-cc-gw.math.tau.ac.il [132.67.252.2]
<1 ms <1 ms <1 ms tel-aviv.tau.ac.il [132.66.4.1]
1 ms <1 ms <1 ms gp1-tau-ge.ilan.net.il [128.139.191.70]
1 ms *
1 ms gp0-gp1-te.ilan.net.il [128.139.188.2]
87 ms 86 ms 87 ms iucc.rt1.fra.de.geant2.net [62.40.125.121]
87 ms 87 ms 87 ms TenGigabitEthernet7-3.ar1.FRA4.gblx.net [207.138.144.45]
177 ms 177 ms 177 ms 204.245.39.226
180 ms 177 ms 265 ms ae1-p151.msr2.re1.yahoo.com [216.115.108.23]
177 ms 177 ms 177 ms te-9-4.bas-a2.re1.yahoo.com [66.196.112.203]
177 ms 177 ms 177 ms f1.www.vip.re1.yahoo.com [69.147.76.15]

Trace complete.

IPv6
• Initial motivation: 32-bit address space soon to
be completely allocated.
• Additional motivation:
– header format helps speed processing/forwarding
– header changes to facilitate QoS
IPv6 datagram format:
– fixed-length 40 byte header
– no fragmentation allowed
Network Layer

4-56

IPv6 Header (Cont)
Priority: identify priority among datagrams in flow
Flow Label: identify datagrams in same “flow.”
(concept of“flow” not well defined).
Next header: identify upper layer protocol for data

Network Layer

4-57

Other Changes from IPv4
• Checksum: removed entirely to reduce
processing time at each hop
• Options: allowed, but outside of header,
indicated by “Next Header” field
• ICMPv6: new version of ICMP
– additional message types, e.g. “Packet Too Big”
– multicast group management functions

Network Layer

4-58

Transition From IPv4 To IPv6
• Not all routers can be upgraded simultaneous
– no “flag days”
– How will the network operate with mixed IPv4 and
IPv6 routers?

• Tunneling: IPv6 carried as payload in IPv4
datagram among IPv4 routers

Network Layer

4-59

Tunneling
B

IPv6

B

C

IPv6

IPv6

IPv4

F
IPv6

D

E

F

IPv4

IPv6

IPv6

IPv6

A

E
IPv6

A
Logical view:

tunnel

Physical view:

data

A-to-B:
IPv6

Src:B
Dest: E

Src:B
Dest: E

Flow: X
Src: A
Dest: F

Flow: X
Src: A
Dest: F

data

Flow: X
Src: A
Dest: F

data

B-to-C:
IPv6 inside
IPv4
Network Layer

B-to-C:
IPv6 inside
IPv4

Flow: X
Src: A
Dest: F

data

E-to-F:
IPv6

4-60

IPv6 status report
• Operating systems –

– wide support – early 2000
– Windows (2000, XP, Vista), BSD, Linux, Apple

• Networking infrastructure
– Cisco

• Deployment
– Slow

• Penetration

– Host - minor (less than 1%)
– Used in 2008 in China Olympic games

• Motivation: CIDR & NAT

Lecture 7: Network Layer II

#61

Queuing Disciplines
• Each router must implement some queuing
discipline
• Queuing allocates both bandwidth and buffer
space:
– Bandwidth: which packet to serve (transmit) next
– Buffer space: which packet to drop next (when
required)

• Queuing also affects latency

Typical Internet Queuing
• FIFO + drop-tail

– Simplest choice
– Used widely in the Internet

• FIFO (first-in-first-out)

– Implies single class of traffic

• Drop-tail

– Arriving packets get dropped when queue is full regardless
of flow or importance

• Important distinction:

– FIFO: scheduling discipline
– Drop-tail: drop policy

FIFO + Drop-tail Problems
• Leaves responsibility of congestion control to
edges (e.g., TCP)
• Does not separate between different flows
• No policing: send more packets  get more
service
• Synchronization: end hosts react to same
events

Active Queue Management
• Design active router queue management to aid
congestion control
• Why?
– Routers can distinguish between propagation and
persistent queuing delays
– Routers can decide on transient congestion, based
on workload

Active Queue Designs
• Modify both router and hosts
– DECbit – congestion bit in packet header

• Modify router, hosts use TCP
– Fair queuing
• Per-connection buffer allocation

– RED (Random Early Detection)
• Drop packet or set bit in packet header as soon as
congestion is starting

Internet Problems
• Full queues

– Routers are forced to have have large queues to
maintain high utilizations
– TCP detects congestion from loss

• Forces network to have long standing queues in steadystate

• Lock-out problem

– Drop-tail routers treat bursty traffic poorly
– Traffic gets synchronized easily  allows a few
flows to monopolize the queue space

Design Objectives
• Keep throughput high and delay low
• Accommodate bursts
• Queue size should reflect ability to accept
bursts rather than steady-state queuing
• Improve TCP performance with minimal
hardware changes

Lock-out Problem
• Random drop
– Packet arriving when queue is full causes some
random packet to be dropped

• Drop front
– On full queue, drop packet at head of queue

• Random drop and drop front solve the lock-out
problem but not the full-queues problem

Full Queues Problem
• Drop packets before queue becomes full (early
drop)
• Intuition: notify senders of incipient
congestion
– Example: early random drop (ERD):
• If qlen > drop level, drop each new packet with fixed
probability p
• Does not control misbehaving users

Random Early Detection (RED)
• Detect incipient congestion, allow bursts
• Keep power (throughput/delay) high
– Keep average queue size low
– Assume hosts respond to lost packets

• Avoid window synchronization
– Randomly mark packets

• Avoid bias against bursty traffic
• Some protection against ill-behaved users

RED Algorithm
• Maintain running average of queue length
• If avgq < minth do nothing
– Low queuing, send packets through

• If avgq > maxth, drop packet
– Protection from misbehaving sources

• Else mark packet in a manner proportional to
queue length
– Notify sources of incipient congestion

RED Operation
Min thresh

Max thresh

P(drop)

Average Queue Length

1.0

maxP
minth

maxth

Avg queue length

RED Algorithm
• Maintain running average of queue length
– Byte mode vs. packet mode – why?

• For each packet arrival

– Calculate average queue size (avg)
– If minth ≤ avgq < maxth
• Calculate probability Pa
• With probability Pa

– Mark the arriving packet

• Else if maxth ≤ avg

– Mark the arriving packet

Queue Estimation
• Standard EWMA: avgq - (1-wq) avgq + wqqlen
– Special fix for idle periods – why?

• Upper bound on wq depends on minth

– Want to ignore transient congestion
– Can calculate the queue average if a burst arrives

• Set wq such that certain burst size does not exceed minth

• Lower bound on wq to detect congestion
relatively quickly
• Typical wq = 0.002

Extending RED for Flow Isolation
• Problem: what to do with non-cooperative
flows?
• Fair queuing achieves isolation using per-flow
state – expensive at backbone routers
– How can we isolate unresponsive flows without
per-flow state?

• RED penalty box

– Monitor history for packet drops, identify flows
that use disproportionate bandwidth
– Isolate and punish those flows

FRED
• Fair Random Early Drop (Sigcomm, 1997)
• Maintain per flow state only for active flows
(ones having packets in the buffer)
• minq and maxq  min and max number of
buffers a flow is allowed occupy
• avgcq = average buffers per flow
• Strike count of number of times flow has
exceeded maxq

FRED – Fragile Flows
• Flows that send little data and want to avoid
loss
• minq is meant to protect these
• What should minq be?
– When large number of flows  2-4 packets
• Needed for TCP behavior

– When small number of flows  increase to avgcq

FRED
• Non-adaptive flows
– Flows with high strike count are not allowed more
than avgcq buffers
– Allows adaptive flows to occasionally burst to
maxq but repeated attempts incur penalty

Stochastic Fair Blue
• Same objective as RED Penalty Box
– Identify and penalize misbehaving flows

• Create L hashes with N bins each
–
–
–
–

Each bin keeps track of separate marking rate (pm)
Rate is updated using standard technique and a bin size
Flow uses minimum pm of all L bins it belongs to
Non-misbehaving flows hopefully belong to at least one
bin without a bad flow
• Large numbers of bad flows may cause false positives

Stochastic Fair Blue
• False positives can continuously penalize same
flow
• Solution: moving hash function over time
– Bad flow no longer shares bin with same flows
– Is history reset does bad flow get to make
trouble until detected again?
• No, can perform hash warmup in background

Buffers
Fabric

Buffer locations
•
•
•
•
•

Input ports
Output ports
Inside fabric
Shared Memory
Combination of all
# 84

fabric

Outputs

Inputs

Input Queuing

# 85

Input Buffer : properties
•
•
•
•
•

Input speed of queue – no more than input line
Need arbiter (running N times faster than input)
FIFO queue
Head of Line (HoL) blocking .
Utilization:
• Random destination
• 1- 1/e = 59% utilization

• due to HoL blocking

# 86

Head of Line Blocking
Stadium

Beer/Soda/Chips

Kwiky Mart
# 90

Output Queuing
Stadium

Beer/Soda/Chips

Kwiky Mart
# 91

A

B C A C B

B

C

# 92

A

B A C B C

ACB

B

C

# 93

A

B B C C B

ACBCA

B

C

# 94

VOQ—Virtual Output Queues
A
ARB

B C A C B

B

C

# 95

A
ARB

AA
B A C B C

B

B

C

C

# 96

A
ARB

AAAA
B C C A B

B

B

C

C

# 97

Performance Issue with Cross-Bars
58.6%

Source: M. J. Karol, M.G. Hluchyj, S. P. Morgan, “Input Versus Output Queueing [sic] on a
Space-Division Packet Switch”, IEEE Transactions on Communications, Vol COM-35, No 12,
December 1987, page 1353
# 98

Overcoming HoL blocking:
look-ahead
 The fabric looks ahead into the input buffer
for packets that may be transferred if they
were not blocked by the head of line.
 Improvement depends on the depth of the
look ahead.
 This corresponds to virtual output queues
where each input port has buffer for each
output port.
# 99

Input Queuing
Virtual output queues

# 100

Overcoming HoL blocking:
output expansion
 Each output port is expanded to L output
ports
 The fabric can transfer up to L packets to
the same output instead of one cell.

Karol and Morgan,
IEEE transaction on communication, 1987: 1347-1356
# 101

Input Queuing
Output Expansion
L

fabric

# 102

Output Queuing
The “ideal”

2
1

1
2

1

2 1
2
1
1

2

2
1

# 103

Output Buffer : properties
•
•
•
•

No HoL problem
Output queue needs to run faster than input lines
Need to provide for N packets arriving to same queue
solution: limit the number of input lines that can be destined
to the output.

# 104

MEMORY

FABRIC

FABRIC

Shared Memory

a common pool of buffers divided into
linked lists indexed by output port number

# 105

Shared Memory: properties
•
•
•
•

Packets stored in memory as they arrive
Resource sharing
Easy to implement priorities
Memory is accessed at speed equal to sum of the
input or output speeds
• How to divide the space between the sessions

# 106

Multicast: one sender to many receivers
• Multicast: one sender to many receivers
– analogy: one teacher to many students

• Question: how to achieve multicast

Internet Multicast Service Model

multicast group concept:
– hosts send IP datagram pkts to multicast group
– hosts that have “joined” that multicast group will
receive pkts sent to that group

Multicast groups
• host group semantics:

– anyone can “join” (receive) multicast group
– anyone can send to multicast gorup
– no network layer identification to hosts of
members
• session/application-level mechanisms needed for
membership identification, privacy

• needed: infrastructure to deliver mcastaddressed packets to all hosts that have joined
that multicast group

Internet Multicast Addressing
• class D Internet addresses reserved for multicast:

• indirection: mcast address does not name a
destination, but host group to receive packet
packet addr:
226.17.30.197

Joining a mcast group: a two-step process
• local: host informs local mcast router of
desire to join group: IGMP
• wide area: local router interacts with other
routers to receive mcast packet flow
– many protocols (e.g., DVMRP, MOSPF, PIM)

IGMP: Internet Group Management Protocol

• host: sends IGMP report when application
joins mcast group
– IP_ADD_MEMBERSHIP socket option
– host need not explicitly “unjoin” group when
leaving

• router: sends IGMP query at regular intervals
– host belonging to a mcast group must reply to
query

IGMP
IGMP version 1
• router: Host Membership
Query msg broadcast on
LAN to all hosts
• host: Host Membership
Report msg to indicate
group membership
– randomized delay before
responding
– implicit leave via no reply to
Query

• RFC 1112

IGMP v2: additions include
• group-specific Query
• Leave Group msg
– last host replying to Query can
send explicit Leave Group msg
– router performs group-specific
query to see if any hosts left in
group
– RFC 2236

IGMP v3: under development as
Internet draft

Multicast Issues
• Naming
• Membership Management
• Routing

IP Multicast Naming
• Class D address represents multicast group
– E.g. 226.17.30.197

• Datagram with destination address set to group
delivered to all hosts in the group
– Indirection
– 226.17.30.197 => 65.30.1.2, 66.8.3.53, 128.32.75.60, …
– Sender may or may not be in the group

• No address hierarchy or subnets
– How is routing done?

Membership Management
• Some other questions:
– Who is part of the group?
– How does one join?
– How does one leave?
– Who decides if it’s OK?

• Membership management answers these

IGMP
• Internet Group Management Protocol
• Runs only between host and router
– Multicast routing takes care of communication
between routers

IGMP
hosts

host-to-router protocol
(IGMP)
routers

multicast routing protocols
(various)

IGMP query
• IGMP membership_query
– Router sends query
– Find out all groups a host belongs to
– Can query a specific group instead
– Sent to the “all systems group” (224.0.0.1) with
TTL=1

IGMP report
• IGMP membership_report
– Response from host to a query
– Can send report unsolicited
• Join group this way!

• IGMP leave_group
– Optional
– Router will clean up membership info on next
membership_query

IGMP properties
• Minimalist semantics

– Host controlled membership

• No decision about:

– Who controls membership
– Invitations
– How to find groups and join them

• Move these decisions to application layer

Soft state
• Host is authoritative on group membership
• Router maintains “soft state”
• A crashed router soon recovers
– Sends a new membership_query
– Misdelivers packets for a little while
• OK by IP service model!

Protocol types
• Dense mode protocols
–
–
–
–
–

assumes dense group membership
Source distribution tree and NACK type
DVMRP (Distance Vector Multicast Routing Protocol)
PIM-DM (Protocol Independent Multicast, Dense Mode)
Example: Company-wide announcement

• Sparse mode protocol
–
–
–
–

assumes sparse group membership
Shared distribution tree and ACK type
PIM-SM (Protocol Independent Multicast, Sparse Mode)
Examples: a Shuttle Launch

CS 640

123

Multicast Routing
• A number of routers have hosts that belong to
a multicast group
• How to connect them (and others) in a tree?
– Shared tree: single tree for all
– Source-based tree: many trees

Core-Based Tree
• Tree rooted at a core
• To join a group, send unicast message towards
core
– Add all links traversed until hit existing tree

Choice of Core
• If core close to source, efficiency is good
• If core far from source, efficiency falls
– Delay up to twice optimal

• Optimal core placement is NP-hard
– Use heuristics

Source-based Trees
• Different tree for each possible source
– Why?

• Reverse path forwarding to figure out tree
• Pruning to leave out routers

Pruning
• Prune when no attached members or
downstream routers
• Propagate prune messages upstream
S: source
R1

router with attached
group member

R4
R2

router with no attached
group member

P
R5

R3
R6

P
R7

P

prune message
links with multicast
forwarding

DVMRP
•
•
•
•

Distance Vector Multicast Routing Protocol
DV + RPF + Pruning
DV vector carries distance to multicast sources
Pruning carries a timeout
– Afterwards, traffic delivery is resumed

• Explicit graft message to reverse pruning
– Done upon join

MOSPF
• Multicast Extensions to OSPF
• Link-state advertisements include multicast group
membership
– Only report directly connected hosts

• Compute shortest-path spanning tree rooted at
source
– On demand, when receiving packet from source for the
first time
– Forward multicast traffic along tree

MOSPF performance
• Global state allows source-based trees to be
used
– Faster delivery of messages

• Overhead
– Joins and leaves flooded to all routers
– Any change may cause whole tree to be
recomputed

PIM
• Protocol Independent Multicast

– Uses routing tables, but agnostic of how they are built

• Two settings:

– Dense: most routers members of a group
• Use RPF flooding with pruning

– Sparse: most routers not members of a group

• Use shared tree or source-based tree based on data characteristics

• Uses soft-state

Sparse vs. Dense
Dense Mode
• Dense participants
• B/W plentiful
• Membership assumed
until pruned
• Data driven

Sparse Mode
• Sparse participants
• B/W overhead
significant
• Membership explicitly
requested
• Receiver driven

Shared v. Source-based Trees
• Shared trees used initially
– Tree rooted at rendezvouz-point (RP)

• Can switch to source-based trees when data
rate is high
– RP sends a Join message to source
– Each router independently decides to switch to
source-based tree, sends Join to source

Shared Tree Example
G

G

RP

S

G

PIM Receiver Join
G

G
G

Report G

RP

Join *,G

S

What if
join is here?

G

PIM Shared Tree After Join
G

G
G
RP

S

G

G

PIM Source Based Tree
G

G
G
RP

S
Join s,g

G

G

PIM Source Based Tree
G

G
G
RP

S

G

G

PIM routing tables
• Routing entries of the form (s,g)
– s - source
– g - group

• Wildcard entries (*,g) for shared-group trees
• Packets are routed using best match

MULTIMEDIA COMMUNICATION & NETWORKS

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to MULTIMEDIA COMMUNICATION & NETWORKS

Similar to MULTIMEDIA COMMUNICATION & NETWORKS (20)

More from Kathirvel Ayyaswamy

More from Kathirvel Ayyaswamy (20)

Recently uploaded

Recently uploaded (20)

MULTIMEDIA COMMUNICATION & NETWORKS