SlideShare une entreprise Scribd logo
1  sur  49
How the TCP/IP Protocol Works

                            Les Cottrell – SLAC
Lecture # 1 presented at the 26th International Nathiagali Summer College on Physics
        and Contemporary Needs, 25th June – 14th July, Nathiagali, Pakistan




          Partially funded by DOE/MICS Field Work Proposal on Internet End-to-end
                   Performance Monitoring (IEPM), also supported by IUPAP      1
Overview
• This is not a lecture on how to program TCP/IP,
  rather an introduction to how major portions works
• IP
• Addressing: IP addresses, ARP, routing
• ICMP
• UDP
• TCP: flow control, error recovery, establishment,
  diconnect
• References:
  – “Internetworking with TCP/IP, volume I, principles, protocols & Architecture”,
    by Douglas Comer
  – “TCP/IP Illustrated: the protocols”, by W. Richard Stevens
  – Most information also available free via Web searches
                                                                  2
Internet Protocol (IP RFC-791)
TCP/IP Internet provides 3 layers of service
                   Application services
                    Transport Services
         Connectionless packet delivery service
•Layering allows one to replace one service without affecting
others
•IP layer (basic unit of transfer in TCP/IP) provides:
   •Best-effort (does not discard capriciously), unreliable (no
   guarantees)
       •Packet may be lost, duplicated, out-of-order with no
       notification
   •Connectionless (each packet treated independently)
   •IP software provides routing                    3
Internet datagram
• Basic transfer unit
        Datagram header         Datagram data area
• Format of Internet datagram
   0    4        8             16     19      24         31
   Vers Hlen Type of serv.              Total length
         Identification          Flags Fragment offset
       TTL          Protocol         Header Checksum
                      Source IP address
                    Destination IP address
              IP Options (if any)                Padding
                             Data
                              …
                                                      4
IP datagram format (cont.)
• Vers (4 bits): version of IP protocol (IPv4=4)
• Hlen (4 bits): Header length in 32 bit words, without
  options (usual case) = 20
• Type of Service – TOS (8 bits): little used in past, now
  being used for QoS
• Total length (16 bits): length of datagram in bytes, includes
  header and data
• Time to live – TTL (8bits): specifies how long datagram is
  allowed to remain in internet
   – Routers decrement by 1
   – When TTL = 0 router discards datagram
   – Prevents infinite loops
• Protocol (8 bits): specifies the format of the data area
   – Protocol numbers administered by central authority to guarantee
     agreement, e.g. TCP=6, UDP=17 …                     5
IP Datagram format (cont.)
• Source & destination IP address (32 bits each):
  contain IP address of sender and intended recipient
• Options (variable length): Mainly used to record a
  route, or timestamps, or specify routing




                                            6
IP Fragmentation
• How do we send a datagram of say 1400 bytes through a
  link that has a Maximum Transfer Unit (MTU) of say 620
  bytes?
• Answer the datagram is broken into fragments


 Net 1                                                              Net 3
                              Net 2
 MTU=1500                                                           MTU=1500
                              MTU=620
   – Router fragments 1400 byte datagrams
      • Into 600 bytes, 600 bytes, 200bytes (note 20 bytes for IP header)
      • Routers do NOT reassemble, up to end host



                                                                    7
Fragmentation Control
• Identification: copied into fragment, allows destination to
  know which fragments belong to which datagram
• Fragment Offset (12 bits): specifies the offset in the
  original datagram of the data being carried in the fragment
   – Measured in units of 8 bytes starting at 0
• Flags (3 bits): control fragmentation
   – Reserved (0-th bit)
   – Don’t Fragment – DF (1st bit):
      • useful for simple (computer bootstrap) application that can’t handle
      • also used for MTU discovery (see later)
      • if need to fragment and can’t router discards & sends error to source
   – More Fragments (least sig bit): tells receiver it has got last
     fragment
• TCP traffic is hardly ever fragmented (due to use of MTU
  discovery). About 0.5% - 0.1% of TCP packets are
  fragmented .                                    8
Fragment series composition




   Offset=0         Offset=1480      Offset=2960      Offset=3440
   More frags       More frags       More frags       Last frag




NB. If data segment contains its own header that is not
replicated                                          9
Internet Addressing
• IP address is a 32 bit integer
  – Refers to interface rather than host
  – Consists of network and host portions
       • Enables routers to keep 1 entry/network instead of 1/host
  –   Class A, B, C for unicast
  –   Class D for multicast
  –   Class E reserved
  –   Classless addresses
• Written as 4 octets/bytes in decimal format
  – E.g. 134.79.16.1, 127.0.0.1


                                                          10
Internet Class-based addresses
• Class A: large number of hosts, few networks
   – 0nnnnnnn hhhhhhhh hhhhhhhh hhhhhhhh
      • 7 network bits (0 and 127 reserved, so 126 networks), 24 host bits (> 16M
        hosts/net)
      • Initial byte 1-127 (decimal)
• Class B: medium number of hosts and networks
   – 10nnnnnn nnnnnnnn hhhhhhhh hhhhhhhh
      • 16,384 class B networks, 65,534 hosts/network
      • Initial byte 128-191 (decimal)
• Class C: large number of small networks
   – 110nnnnn nnnnnnnn nnnnnnnn hhhhhhhh
      • 2,097,152 networks, 254 hosts/network
      • Initial byte 192-223 (decimal)
• Class D: 224-239 (decimal) Multicast [RFC1112]
• Class E: 240-255 (decimal) Reserved
                                                                  11
Subnets
• A subnet mask is applied to the host bits to
  determine how the network is subnetted, e.g. if the
  host is: 137.138.28.228, and the subnet mask is
  255.255.255.0 then the right hand 8 bits are for the
  host (255 is decimal for all bits set in an octet)
• Host addresses of all bits set or no bits set, indicate a
  broadcast, i.e. the packet is sent to all hosts.




                                               12
Prefix
               Subnet Mask Conversions
                       Prefix
         Subnet Mask             Subnet Mask
Length                 Length

/1       128.0.0.0      /17     255.255.128.0
/2       192.0.0.0      /18     255.255.192.0
/3       224.0.0.0      /19     255.255.224.0
/4       240.0.0.0                                Decimal Octet    Binary Number
                        /20     255.255.240.0
/5       248.0.0.0      /21     255.255.248.0
/6       252.0.0.0      /22     255.255.252.0         128          1000 0000
/7       254.0.0.0      /23     255.255.254.0         192          1100 0000
/8       255.0.0.0      /24     255.255.255.0         224          1110 0000
/9       255.128.0.0    /25     255.255.255.128       240          1111 0000
/10      255.192.0.0    /26     255.255.255.192       248          1111 1000
/11      255.224.0.0    /27     255.255.255.224       252          1111 1100
/12      255.240.0.0    /28     255.255.255.240       254          1111 1110
/13      255.248.0.0    /29     255.255.255.248       255          1111 1111
/14      255.252.0.0    /30     255.255.255.252
/15      255.254.0.0    /31     255.255.255.254
/16      255.255.0.0    /32     255.255.255.255



                                                                  13
Address depletion
• In 1991 IAB identified 3 dangers
  – Running out of class B addresses
  – Increase in nets has resulted in routing table explosion
  – Increase in net/hosts exhausting 32 bit address space
• Four strategies to address
  – Creative address space allocation {RFC 2050}
  – Private addresses {RFC 1918}, Network Address
    Translation (NAT) {RFC 1631}
  – Classless InterDomain Routing (CIDR) {RFC 1519}
  – IP version 6 (IPv6) {RFC 1883}


                                                   14
Creative IP address allocation
• Class A addresses 64 – 127 reserved
  – Handle on individual basis
• Class B only assigned given a demonstrated need
• Class C
  – divided up into 8 blocks allocated to regional authorities
  – 208-223 remains unassigned and unallocated
• Three main registries handle assignments
  – APNIC – Asia & Pacific www.apnic.net
  – ARIN – N. & S. America, Caribbean & sub-Saharan
    Africa www.arin.net
  – RIPE – Europe and surrounding areas www.ripe.net

                                                   15
Private IP Addresses
• IP addresses that are not globally unique, but used
  exclusively in an organization
• Three ranges:
  – 10.0.0.0 - 10.255.255.255 a single class A net
  – 172.16.0.0 - 172.31.255.255 16 contiguous class Bs
  – 192.168.0.0 – 192.168.255.255 256 contiguous class Cs
• Connectivity provided by Network Address
  Translator (NAT)
  – translates outgoing private IP address to Internet IP
    address, and a return Internet IP address to a private
    address
  – Only for TCP/UDP packets
                                                   16
Class InterDomain Routing (CIDR)
• Many organization have > 256 computers but few
  have more than several thousand
• Instead of giving class B (16384 nets) give
  sufficient contiguous class C addresses to satisfy
  needs
  – < 256 addresses assign 1 class C
  –…
  – < 8192 addresses assign 32 contiguous Class C nets




                                                17
CIDR & Supernetting
• Since assigned contiguously, class C CIDR has same most
  significant bits & so only needs one routing table entry
• CIDR block represented by a prefix and prefix length
   – Prefix = single address representing block of nets, e.g
      • 192.32.136.0 = 11000000 00100000 10001000 00000000 while
      • 192.32.143.0 = 11000000 00100000 10001111 00000000


                             21 bit prefix (2048 host addresses)
   – Prefix length indicates number of routing bits, e.g.
        192.32.136.0/21 means 21 bits used for routing
      • CIDR collects all nets in range 192.32.136.0 through 143.0 into a single
        router entry – reduces router table entries
• Removes address classes A, B & C boundaries
• For more details see RFC 1519
                                                                   18
Address Recognition Protocol (ARP)
• IP address is at network layer, need to map it to the
  MAC (Ethernet address) link layer address
• Use ARP to map 48 bit Ethernet address to 32 bit IP
  – IP requests MAC address for IP address from local ARP
    table
  – If not there, then an ARP request packet for IP address is
    sent using physical broadcast address (all FFFs)
  – Host with requested IP address responds with its MAC
    address as a unicast packet
  – On return, host updates ARP table and returns MAC
    address
  – ARP cache times out
  – ARP packets are on top of Ethernet
                                                  19
ARP cont.
• ARP requests are local only, do not cross routers
               Subnet 1                           Subnet 2
134.79.10.17              134.79.10.1   134.79.15.1          134.79.15.3
 User A                                                        User B

• Compare local IP and subnet mask => local subnet
• Compare local subnet to destination IP
    – if local, ARP for MAC address
    – else remote so
        • if ROUTE entry, ARP for router to subnet
        • if default route, ARP for default gateway
        • otherwise, drop packet & return error
                                                             20
Routing
• Routers must select next hop for packet
• Get route information from other routers via a
  routing protocol (RIP, OSPF, EIGRP etc.)
• Note the following are non-routable:
  – private networks: 10.0.0.0/8, 172.16.0.0/12,
    192.168.0.0/16
  – Loopback 127.0.0.0/24




                                                   21
ICMP Purpose (RFC 792)
• Communicates control & error information
  –   Between routers and hosts
  –   Only reports to original source, suggests corrections
  –   Error messages about error messages are not generated
  –   Never generated due to multicasts
• Packet format
       0       8         16        24   31
       Type    Code           Checksum
       ICMP data (depends on type/code)


                                                  22
Main ICMP request types

Type ICMP
0    Echo reply, ping
3    Destination unreachable (code 1 host, code 3 port)
     DF and must fragment (code 4)
4     Source quench
5     Redirect (change a route)
8     Echo request
11    Time exceeded (code 0 ttl=0, code 1 reassembly)
12    Parameter problems

                                               23
ICMP Echo/Ping
• Very commonly used diagnostic tool
• Implementations vary between OS’
• Build echo request
   0        8          16      24    31
   Type=8 Code=0           Checksum
       Identifier      Sequence number
                 Optional data

  – Identifier used to match request to replies (e.g. pid)
  – Sequence number, starts at 0 increments by 1 for each ping packet
     • Used to detect loss, reorder, duplicates
  – Optional data, sent by requester, returned by replier
     • Usually contains a timestamp when the request was sent plus pad data
                                                                 24
What do we learn from Ping
• Host reachable
     – Host may respond to ping but not be running services
•   Round trip timing
•   Lost packets
•   Packet reordering duplicate packets
•   Example:
    13cottrell@noric05:~>ping -c 4 lhr.comsats.net.pk
    PING lhr.comsats.net.pk (210.56.16.10) from 134.79.125.205 : 56(84) bytes of data.
    64 bytes from lhr.comsats.net.pk (210.56.16.10): icmp_seq=0 ttl=242 time=716.962 msec
    64 bytes from lhr.comsats.net.pk (210.56.16.10): icmp_seq=1 ttl=242 time=720.375 msec
    64 bytes from lhr.comsats.net.pk (210.56.16.10): icmp_seq=2 ttl=242 time=725.907 msec
    64 bytes from lhr.comsats.net.pk (210.56.16.10): icmp_seq=3 ttl=242 time=710.734 msec

    --- lhr.comsats.net.pk ping statistics ---
    4 packets transmitted, 4 packets received, 0% packet loss
    round-trip min/avg/max/mdev = 710.734/718.494/725.907/5.566 ms
                                                                        25
Unreachable
76cottrell@flora06:~>ping islamabad-server2.comsats.net.pk
ICMP 13 Unreachable from gateway 207.45.205.18
 for icmp from FLORA06.SLAC.Stanford.EDU (134.79.16.101)
to islamabad-server2.comsats.net.pk (210.56.8.8)


What does this mean, see exercise?




                                                 26
Time Exceeded
   0       8             16        24  31
   Type 11 Code               Checksum
                    Unused
        Internet header & 8 bytes of data
• Time-to-live has expired at a router (code=0)
  – ttl sets bound on number routers datagram can transit
     • Prevents infinite routine loops
     • Initialized by sender, decremented by 1 each time passes router
     • When ttl = 0 datagram thrown away & sender notified by
       ICMP message
• Fragment reassembly timer (code=1)
                                                        27
MTU Discovery
• Path MTUs vary
• Fragmentation is bad
• Small transmission units are bad
• SO need to discover optimum MTU (largest without
  fragmentation)
• Host sends a packet with the Don’t Fragment bit set
    – Length is lesser of local MTU and MSS announced by
      remote system
    – If MTU between hosts requires fragmentation (e.g. at an
      intermediate router), then
       • if an ICMP DF bit set & must fragment then an ICMP message
         is sent back to source, saying “I can’t fragment”
       • try again with smaller size.
                                                           28
User Datagram Protocol - UDP
• RFC 768, Protocol 17
App.        Port 1     Port 2        Port 1   Port 2   Demux on
                                                       Port number
Transport            TCP                  UDP
                                                       Demux on
Network                         IP                     IP protocol
• Provides unreliable, connectionless on top of IP
• Minimal overhead, high performance
   – No setup/teardown, 1 datagram at a time
• Application responsible for reliability
   – Includes datagram loss, duplication, delay, out-of-
     sequence, multiplexing, loss of connectivity
                                                       29
UDP Datagram format
    0               8              16            24           31
    Source port     Destination port
    UDP message len Checksum (opt.)
                  Data
                   …
• Source/destination port: port numbers identify sending & receiving
  processes
   – Port number & IP address allow any application in any computer on Internet to
     be uniquely identified
   – Used to demultiplex datagrams to processes
   – Ports can be static or dynamic
        • Static (< 1024) assigned centrally, known as well known ports
        • Dynamic

• Message length in bytes includes the UDP header and data
                                                        30
UDP applications
• Message oriented, e.g. SNMP, DNS, time
• File system, e.g. NFS, AFS
• Lightweight file transfer, e.g. tftp, bootp




                                                31
Transmission Control Protocol -TCP
• RFC 768 & host requirements RFC 1122
  – Reliable stream transport
     • Connection oriented (full duplex virtual circuit)
          – Conceptually place call, two ends communicate to agree on details
          – After agreeing application notified of connection
          – During transfer, ends communicate continuously to verify data received
            correctly
          – When done, ends tear down the connection
          – If UDP is like regular mail, TCP is like phone call
     •   Provides buffering and flow control
     •   Takes care of lost packets, out of order, duplicates, long delays
     •   Isolates application program from network details
     •   Jargon
          – Segment = TCP packet
          – Socket= source (address + port) + destination (address + port)
                                                                   32
TCP layering
App.         Port 1       Port 2         Port 1      Port 2
                                                                  Demux on
Transport               TCP                    UDP                Port number
            IP port 6                                             Demux on
 Network                 IP                                       IP protocol
• To ID connection need:
   – Source: (address, port) AND Destination: (address, port)
   – Only need one port on host to allow multiple connections, since
     each connection will have different (host, port) at other end
       • E.g. single host can serve multiple telnet connections
• Passive open: application contacts OS & indicates will
  accept incoming connection, OS assigns port and listens
• Active open: application requests OS to connect to an (host,
  port)
                                                                  33
TCP – providing reliability
• Positive acknowledgement (ACK) with
  retransmission
   – Sender keeps record of each packet sent
   – Sender awaits an ACK
   – Sender starts timer when sends packet
   Sender site        Receiver site
Send pkt 1
                        Rcv pkt 1



                                      Time
                        Send ACK 1
Rcv ACK 1
Send pkt 2
                        Rcv pkt 2
                        Send ACK 2
Rcv ACK 2
             Network messages                  34
TCP – simple lost packet recovery

      Sender site          Receiver site
   Send pkt 1           Loss
   Start timer                Pkt should arrive
ACK normally                  ACK should be sent
arrives
  Timer expires
Retransmit pkt 1
    start timer               Rcv pkt 1
                              Send ACK 1
    Rcv ACK 1


                 Network messages
                                                   35
TCP – improving performance
• BUT simple ACK protocol wastes bandwidth since it must
  delay sending next packet until it gets ACK
• Use sliding window
  Initial window of 4 packets         Window slides

  1   2 3 4 5 6 7 8 …                  1   2 3 4 5 6 7 8 …



         Packets successfully sent             Packets to be sent
                     Packets sent, awaiting ACK
• Sender can send 4 packets of data without ACK
   – When sender gets ACK then can send another packet
   – Window = unacknowledged packets/bytes
                                                         36
Tuning to fill pipe
• Optimal window size depends on:
  – Bandwidth end to end, i.e. min(BWlinks) AKA bottleneck
    bandwidth
  – Round Trip Time (RTT)
  – For TCP keep pipe full
     • Window (sometime called pipe) ~ RTT*BW
  – Can increase bandwidth by             Src          Rcv
   orders of magnitude
• Windows also used for flow control
    t = bits in packet/link speed
                                                     ACK
                         RTT                    37
Implementation
• Sliding window operates at byte level, NOT packet
      Current window

      1 2 3 4 5 6 7 8 …


                   Highest byte that can be sent        3 pointers
             Highest byte sent
        Bytes sent and acknowledged
• Receiver keeps similar window to put stream back
  together
• Since full duplex, altogether 4 windows & pointer
  sets
                                                   38
TCP flow control
• Windows vary over time
  – Receiver advertises (in ACKs) how many it can receive
     • Based on buffers etc. available
  – Sender adjusts its window to match advertisement
  – If receiver buffers fill, it sends smaller adverts
• Used to match buffer requirements of receiver
• Also used to address congestion control (e.g. in
  intermediate routers)




                                                 39
TCP Segment format
    0   4    8   10    16       24      31
    Source port         Destination port
              Sequence number
         Acknowledgement number
    Hlen Resv Code           Window
        Checksum            Urgent ptr
          Options (if any)        Padding
                 Data if any
                      …
• Source/Dest port: TCP port numbers to ID applications at
  both ends of connection
• Sequence number: ID position in sender’s byte 40
                                                stream
TCP segment format – cont.
• Acknowledgement: identifies the number of the
  byte the sender of this segment expects to receive
  next
• Hlen: specifies the length of the segment header in
  32 bit multiples. If there are no options, the Hlen = 5
  (20 bytes)
• Reserved for future use, set to 0
• Code: used to determine segment purpose, e.g.
  SYN, ACK, FIN, URG


                                              41
TCP Segment format- cont
• Window: Advertises how much data this station is
  willing to accept. Can depend on buffer space
  remaining.
• Checksum: Verifies the integrity of the TCP header
  and data. It is mandatory.
• Urgent pointer: used with the URG flag to indicate
  where the urgent data starts in the data stream.
  Typically used with a file transfer abort during FTP
  or when pressing an interrupt key in telnet.
• Options: used for window scaling, SACK,
  timestamps, maximum segment size etc.
                                            42
TCP timeout
• Need a timeout estimate that will work for LANs
  (RTT < msec.) to satellite WANs (hundreds of
  msec. to secs). RTT can vary a lot with time of day,
  day of week, or one second to next.         May 12th




                                                RTT ms.
  –   TCP records time segment sent
  –   and time ACK received
  –   Then calculates RTT sample
  –   Smooth & use to estimate timeout, e.g.              Time of day
       • Timeout=beta * RTTs
       • Timeout= RTTs + eta{=4}*f(dev(RTTs))
  – Needs to take account of losses, e.g.
       • New_timeout=gamma{2} * timeout

                                                           43
TCP connection establishment
• 3 way handshake
  Site 1       Active
                                      Site 2
Send SYN seq x Win                    Rcv SYN segment
                    4096, mss
                               1024
               Passive                Send SYN seq=y, ACK x+1
Rcv SYN/ACK
               Win 40  96, mss 1024
 Send ACK y+1
                                      Rcv ACK segment
• Initial sequence numbers (x, y) are chosen randomly
• Guarantees both sides ready & know it, and sets
  initial sequence numbers, also sets window & mss
• Once connection established, data can flow in both
  directions, equally well, there is no master or slave
                                                 44
TCP close connection
• Modified 3 way handshake (or 4 way termination)
     Site 1            Site 2
  (App closes)
  Send FIN seq=x
                                Rcv FIN segment
Rcv ACK segment                 Send ACK x=1
                                (inform app)
                                (app closes connection)
Rcv FIN + ACK seg               Send FIN seq=y, ACK x+1
Send ACK y+1
                                Receive ACK segment
• App tells TCP to close, TCP sends remaining data & waits
  for ACK, then sends FIN
• Site 2 TCP ACKs FIN, tells its application “end of data”
• Site 2 sends FIN when its app closes connection (may be
  long delay (e.g. require human interaction).    45
More Information
• Lectures, tutorials etc:
   –   www.nv.cc.va.us/home/joney/tcp_ip.htm
   –   www.cs.pdx.edu/~jrb/tcpip.lectures.html
   –   www.raleigh.ibm.com/cgi-bin/bookmgr/BOOKS/EZ306200/CCONTENTS
   –   www.cisco.com/univercd/cc/td/doc/product/iaabu/centri4/user/scf4ap1.htm
   –   www.cis.ohio-state.edu/htbin/rfc/rfc1180.html
   –   www.jbmelectronics.com/tcp.htm
• Encylopaedia
   – http://www.freesoft.org/CIE/index.htm
• TCP/IP Resources
   – www.private.org.il/tcpip_rl.html
• Understanding IP addresses
   – http://www.3com.com/solutions/en_US/ncs/501302.html
• Configuring TCP (RFC 1122)
   – ftp://nic.merit.edu/internet/documents/rfc/rfc1122.txt
• Assigned protocols, ports etc (RFC 1010)
   – http://www.es.net/pub/rfcs/rfc1010.txt & /etc/protocols

                                                                  46
Example: 3 way handshake
• atlas> telnet sunstats.cern.ch
  – atlas is a WNT PC, sunstats is a Sun Solaris 5.6 host
  – MSS is set in TCP option in a SYN segment,
    communicates the MSS the sender wants to receive
  – len=ip_hlen/tcp_hlen:ip_total_len
  – Initial Sequence Numbers are randomly selected
  – Telnet = port 23
  – W=Receive window size advertises how much data this
    host will accept



                                               47
Example: 3 way handshake - cont.
• TCP from atlas:1174 to sunstats:23 seq=180839,
  A=0, W=8192, SYN [len=5/6:44, opt=020405B4
  <opt=2, len=4, mss=0x5B4=1460>]
• TCP from sunstats:23 to atlas:1174
  seq=1383568304, A=180840, W=64240, SYN/ACK
  [len=5/6:44, opt=020405B4]
• TCP from atlas:1174 to sunstats:23 seq =180840,
  A=1383568305, W=8760 [len=5/5:40, opt=nul]
  – Notice window size can vary from segment to segment depending
    on buffer space available
  – Notice smaller PC window advertisement
  – Notice ephemeral port selected by telnet client
  – Notice acknowledge next expected byte (=seq+1)
  – 0x020405B4: 02 = option type, 04=len, 0x5B4=1460
                                                     48
Session start
SLAC>CERN: 256kbyte window,1 stream,
full speed > 30msec, 13MBytes in 20s, 5.1MBytes/s

                                   Congestion window




             Rcvr Advertised window


              Segments sent
                                             Acks returned by
                                             Rcvr
                                                    49

Contenu connexe

Tendances

Tendances (20)

ip addressing & routing
 ip addressing & routing ip addressing & routing
ip addressing & routing
 
Ipv4 and Ipv6
Ipv4 and Ipv6Ipv4 and Ipv6
Ipv4 and Ipv6
 
Ipspk1
Ipspk1Ipspk1
Ipspk1
 
Chap3. i pv4
Chap3. i pv4Chap3. i pv4
Chap3. i pv4
 
Lesson 3: IPv6 Fundamentals
Lesson 3: IPv6 FundamentalsLesson 3: IPv6 Fundamentals
Lesson 3: IPv6 Fundamentals
 
IP Addressing and Subnetting
IP Addressing and SubnettingIP Addressing and Subnetting
IP Addressing and Subnetting
 
Ipv4 and Ipv6
Ipv4 and Ipv6Ipv4 and Ipv6
Ipv4 and Ipv6
 
Ip addressing3
Ip addressing3Ip addressing3
Ip addressing3
 
IP Addressing (Subnetting, VLSM, Supernetting)
IP Addressing (Subnetting, VLSM, Supernetting)IP Addressing (Subnetting, VLSM, Supernetting)
IP Addressing (Subnetting, VLSM, Supernetting)
 
REMnux tutorial 4.1 - Datagrams, Fragmentation & Anomalies
REMnux tutorial 4.1 - Datagrams, Fragmentation & AnomaliesREMnux tutorial 4.1 - Datagrams, Fragmentation & Anomalies
REMnux tutorial 4.1 - Datagrams, Fragmentation & Anomalies
 
Networking basics and basic cisco commands
Networking basics and basic cisco commandsNetworking basics and basic cisco commands
Networking basics and basic cisco commands
 
Cn ipv4 addressing
Cn ipv4 addressingCn ipv4 addressing
Cn ipv4 addressing
 
Visual ip subnetting
Visual ip subnettingVisual ip subnetting
Visual ip subnetting
 
QuickTutorial Guide Advanced Topics in IP Addressing
QuickTutorial Guide Advanced Topics in IP AddressingQuickTutorial Guide Advanced Topics in IP Addressing
QuickTutorial Guide Advanced Topics in IP Addressing
 
Ip addressing
Ip addressingIp addressing
Ip addressing
 
Basic Understanding about TCP/IP Addressing system
Basic Understanding about TCP/IP Addressing systemBasic Understanding about TCP/IP Addressing system
Basic Understanding about TCP/IP Addressing system
 
Internet address
Internet addressInternet address
Internet address
 
Chap2. ipv4-arp-icmp
Chap2. ipv4-arp-icmpChap2. ipv4-arp-icmp
Chap2. ipv4-arp-icmp
 
ipv6 ppt
ipv6 pptipv6 ppt
ipv6 ppt
 
Ip addressing
Ip addressingIp addressing
Ip addressing
 

En vedette (6)

cunjuctions
cunjuctionscunjuctions
cunjuctions
 
Comparativa de galeries d’imatges al núvol
Comparativa de galeries d’imatges al núvolComparativa de galeries d’imatges al núvol
Comparativa de galeries d’imatges al núvol
 
Punchtuation
PunchtuationPunchtuation
Punchtuation
 
Prime numbers
Prime numbersPrime numbers
Prime numbers
 
Prime numbers
Prime numbersPrime numbers
Prime numbers
 
36649420 elaboracion-de-composta-de-15-dias
36649420 elaboracion-de-composta-de-15-dias36649420 elaboracion-de-composta-de-15-dias
36649420 elaboracion-de-composta-de-15-dias
 

Similaire à Tcpip

210202021018701 suratNetworkSecurity.ppt
210202021018701 suratNetworkSecurity.ppt210202021018701 suratNetworkSecurity.ppt
210202021018701 suratNetworkSecurity.ppt
jayvagasiya136
 
chsadsadasdasdasdasdsadsadsadsadsadasda10.ppt
chsadsadasdasdasdasdsadsadsadsadsadasda10.pptchsadsadasdasdasdasdsadsadsadsadsadasda10.ppt
chsadsadasdasdasdasdsadsadsadsadsadasda10.ppt
ssuserde23af
 

Similaire à Tcpip (20)

tcpip.ppt
tcpip.ppttcpip.ppt
tcpip.ppt
 
Internet protocol
Internet protocolInternet protocol
Internet protocol
 
Network.pptx
Network.pptxNetwork.pptx
Network.pptx
 
210202021018701 suratNetworkSecurity.ppt
210202021018701 suratNetworkSecurity.ppt210202021018701 suratNetworkSecurity.ppt
210202021018701 suratNetworkSecurity.ppt
 
computerNetworkSecurity.ppt
computerNetworkSecurity.pptcomputerNetworkSecurity.ppt
computerNetworkSecurity.ppt
 
IT Networks and Vulnarabilities .pdf
IT Networks and Vulnarabilities .pdfIT Networks and Vulnarabilities .pdf
IT Networks and Vulnarabilities .pdf
 
IP Routing.pptx
IP Routing.pptxIP Routing.pptx
IP Routing.pptx
 
chsadsadasdasdasdasdsadsadsadsadsadasda10.ppt
chsadsadasdasdasdasdsadsadsadsadsadasda10.pptchsadsadasdasdasdasdsadsadsadsadsadasda10.ppt
chsadsadasdasdasdasdsadsadsadsadsadasda10.ppt
 
Internet Protocol
Internet ProtocolInternet Protocol
Internet Protocol
 
Networking essentials lect2
Networking essentials lect2Networking essentials lect2
Networking essentials lect2
 
ITFT - IP adressing
 ITFT - IP adressing ITFT - IP adressing
ITFT - IP adressing
 
lecture08.ppt
lecture08.pptlecture08.ppt
lecture08.ppt
 
Ramakant tyagi presentation on ip addressing
Ramakant tyagi presentation on ip addressingRamakant tyagi presentation on ip addressing
Ramakant tyagi presentation on ip addressing
 
16.) layer 3 (basic tcp ip routing)
16.) layer 3 (basic tcp ip routing)16.) layer 3 (basic tcp ip routing)
16.) layer 3 (basic tcp ip routing)
 
chapter 4.pptx
chapter 4.pptxchapter 4.pptx
chapter 4.pptx
 
Networking basics
Networking basicsNetworking basics
Networking basics
 
Tcp
TcpTcp
Tcp
 
5 configuring TCP/IP
5 configuring TCP/IP5 configuring TCP/IP
5 configuring TCP/IP
 
Internetworking
InternetworkingInternetworking
Internetworking
 
Ip protocals subnetworking
Ip protocals  subnetworkingIp protocals  subnetworking
Ip protocals subnetworking
 

Tcpip

  • 1. How the TCP/IP Protocol Works Les Cottrell – SLAC Lecture # 1 presented at the 26th International Nathiagali Summer College on Physics and Contemporary Needs, 25th June – 14th July, Nathiagali, Pakistan Partially funded by DOE/MICS Field Work Proposal on Internet End-to-end Performance Monitoring (IEPM), also supported by IUPAP 1
  • 2. Overview • This is not a lecture on how to program TCP/IP, rather an introduction to how major portions works • IP • Addressing: IP addresses, ARP, routing • ICMP • UDP • TCP: flow control, error recovery, establishment, diconnect • References: – “Internetworking with TCP/IP, volume I, principles, protocols & Architecture”, by Douglas Comer – “TCP/IP Illustrated: the protocols”, by W. Richard Stevens – Most information also available free via Web searches 2
  • 3. Internet Protocol (IP RFC-791) TCP/IP Internet provides 3 layers of service Application services Transport Services Connectionless packet delivery service •Layering allows one to replace one service without affecting others •IP layer (basic unit of transfer in TCP/IP) provides: •Best-effort (does not discard capriciously), unreliable (no guarantees) •Packet may be lost, duplicated, out-of-order with no notification •Connectionless (each packet treated independently) •IP software provides routing 3
  • 4. Internet datagram • Basic transfer unit Datagram header Datagram data area • Format of Internet datagram 0 4 8 16 19 24 31 Vers Hlen Type of serv. Total length Identification Flags Fragment offset TTL Protocol Header Checksum Source IP address Destination IP address IP Options (if any) Padding Data … 4
  • 5. IP datagram format (cont.) • Vers (4 bits): version of IP protocol (IPv4=4) • Hlen (4 bits): Header length in 32 bit words, without options (usual case) = 20 • Type of Service – TOS (8 bits): little used in past, now being used for QoS • Total length (16 bits): length of datagram in bytes, includes header and data • Time to live – TTL (8bits): specifies how long datagram is allowed to remain in internet – Routers decrement by 1 – When TTL = 0 router discards datagram – Prevents infinite loops • Protocol (8 bits): specifies the format of the data area – Protocol numbers administered by central authority to guarantee agreement, e.g. TCP=6, UDP=17 … 5
  • 6. IP Datagram format (cont.) • Source & destination IP address (32 bits each): contain IP address of sender and intended recipient • Options (variable length): Mainly used to record a route, or timestamps, or specify routing 6
  • 7. IP Fragmentation • How do we send a datagram of say 1400 bytes through a link that has a Maximum Transfer Unit (MTU) of say 620 bytes? • Answer the datagram is broken into fragments Net 1 Net 3 Net 2 MTU=1500 MTU=1500 MTU=620 – Router fragments 1400 byte datagrams • Into 600 bytes, 600 bytes, 200bytes (note 20 bytes for IP header) • Routers do NOT reassemble, up to end host 7
  • 8. Fragmentation Control • Identification: copied into fragment, allows destination to know which fragments belong to which datagram • Fragment Offset (12 bits): specifies the offset in the original datagram of the data being carried in the fragment – Measured in units of 8 bytes starting at 0 • Flags (3 bits): control fragmentation – Reserved (0-th bit) – Don’t Fragment – DF (1st bit): • useful for simple (computer bootstrap) application that can’t handle • also used for MTU discovery (see later) • if need to fragment and can’t router discards & sends error to source – More Fragments (least sig bit): tells receiver it has got last fragment • TCP traffic is hardly ever fragmented (due to use of MTU discovery). About 0.5% - 0.1% of TCP packets are fragmented . 8
  • 9. Fragment series composition Offset=0 Offset=1480 Offset=2960 Offset=3440 More frags More frags More frags Last frag NB. If data segment contains its own header that is not replicated 9
  • 10. Internet Addressing • IP address is a 32 bit integer – Refers to interface rather than host – Consists of network and host portions • Enables routers to keep 1 entry/network instead of 1/host – Class A, B, C for unicast – Class D for multicast – Class E reserved – Classless addresses • Written as 4 octets/bytes in decimal format – E.g. 134.79.16.1, 127.0.0.1 10
  • 11. Internet Class-based addresses • Class A: large number of hosts, few networks – 0nnnnnnn hhhhhhhh hhhhhhhh hhhhhhhh • 7 network bits (0 and 127 reserved, so 126 networks), 24 host bits (> 16M hosts/net) • Initial byte 1-127 (decimal) • Class B: medium number of hosts and networks – 10nnnnnn nnnnnnnn hhhhhhhh hhhhhhhh • 16,384 class B networks, 65,534 hosts/network • Initial byte 128-191 (decimal) • Class C: large number of small networks – 110nnnnn nnnnnnnn nnnnnnnn hhhhhhhh • 2,097,152 networks, 254 hosts/network • Initial byte 192-223 (decimal) • Class D: 224-239 (decimal) Multicast [RFC1112] • Class E: 240-255 (decimal) Reserved 11
  • 12. Subnets • A subnet mask is applied to the host bits to determine how the network is subnetted, e.g. if the host is: 137.138.28.228, and the subnet mask is 255.255.255.0 then the right hand 8 bits are for the host (255 is decimal for all bits set in an octet) • Host addresses of all bits set or no bits set, indicate a broadcast, i.e. the packet is sent to all hosts. 12
  • 13. Prefix Subnet Mask Conversions Prefix Subnet Mask Subnet Mask Length Length /1 128.0.0.0 /17 255.255.128.0 /2 192.0.0.0 /18 255.255.192.0 /3 224.0.0.0 /19 255.255.224.0 /4 240.0.0.0 Decimal Octet Binary Number /20 255.255.240.0 /5 248.0.0.0 /21 255.255.248.0 /6 252.0.0.0 /22 255.255.252.0 128 1000 0000 /7 254.0.0.0 /23 255.255.254.0 192 1100 0000 /8 255.0.0.0 /24 255.255.255.0 224 1110 0000 /9 255.128.0.0 /25 255.255.255.128 240 1111 0000 /10 255.192.0.0 /26 255.255.255.192 248 1111 1000 /11 255.224.0.0 /27 255.255.255.224 252 1111 1100 /12 255.240.0.0 /28 255.255.255.240 254 1111 1110 /13 255.248.0.0 /29 255.255.255.248 255 1111 1111 /14 255.252.0.0 /30 255.255.255.252 /15 255.254.0.0 /31 255.255.255.254 /16 255.255.0.0 /32 255.255.255.255 13
  • 14. Address depletion • In 1991 IAB identified 3 dangers – Running out of class B addresses – Increase in nets has resulted in routing table explosion – Increase in net/hosts exhausting 32 bit address space • Four strategies to address – Creative address space allocation {RFC 2050} – Private addresses {RFC 1918}, Network Address Translation (NAT) {RFC 1631} – Classless InterDomain Routing (CIDR) {RFC 1519} – IP version 6 (IPv6) {RFC 1883} 14
  • 15. Creative IP address allocation • Class A addresses 64 – 127 reserved – Handle on individual basis • Class B only assigned given a demonstrated need • Class C – divided up into 8 blocks allocated to regional authorities – 208-223 remains unassigned and unallocated • Three main registries handle assignments – APNIC – Asia & Pacific www.apnic.net – ARIN – N. & S. America, Caribbean & sub-Saharan Africa www.arin.net – RIPE – Europe and surrounding areas www.ripe.net 15
  • 16. Private IP Addresses • IP addresses that are not globally unique, but used exclusively in an organization • Three ranges: – 10.0.0.0 - 10.255.255.255 a single class A net – 172.16.0.0 - 172.31.255.255 16 contiguous class Bs – 192.168.0.0 – 192.168.255.255 256 contiguous class Cs • Connectivity provided by Network Address Translator (NAT) – translates outgoing private IP address to Internet IP address, and a return Internet IP address to a private address – Only for TCP/UDP packets 16
  • 17. Class InterDomain Routing (CIDR) • Many organization have > 256 computers but few have more than several thousand • Instead of giving class B (16384 nets) give sufficient contiguous class C addresses to satisfy needs – < 256 addresses assign 1 class C –… – < 8192 addresses assign 32 contiguous Class C nets 17
  • 18. CIDR & Supernetting • Since assigned contiguously, class C CIDR has same most significant bits & so only needs one routing table entry • CIDR block represented by a prefix and prefix length – Prefix = single address representing block of nets, e.g • 192.32.136.0 = 11000000 00100000 10001000 00000000 while • 192.32.143.0 = 11000000 00100000 10001111 00000000 21 bit prefix (2048 host addresses) – Prefix length indicates number of routing bits, e.g. 192.32.136.0/21 means 21 bits used for routing • CIDR collects all nets in range 192.32.136.0 through 143.0 into a single router entry – reduces router table entries • Removes address classes A, B & C boundaries • For more details see RFC 1519 18
  • 19. Address Recognition Protocol (ARP) • IP address is at network layer, need to map it to the MAC (Ethernet address) link layer address • Use ARP to map 48 bit Ethernet address to 32 bit IP – IP requests MAC address for IP address from local ARP table – If not there, then an ARP request packet for IP address is sent using physical broadcast address (all FFFs) – Host with requested IP address responds with its MAC address as a unicast packet – On return, host updates ARP table and returns MAC address – ARP cache times out – ARP packets are on top of Ethernet 19
  • 20. ARP cont. • ARP requests are local only, do not cross routers Subnet 1 Subnet 2 134.79.10.17 134.79.10.1 134.79.15.1 134.79.15.3 User A User B • Compare local IP and subnet mask => local subnet • Compare local subnet to destination IP – if local, ARP for MAC address – else remote so • if ROUTE entry, ARP for router to subnet • if default route, ARP for default gateway • otherwise, drop packet & return error 20
  • 21. Routing • Routers must select next hop for packet • Get route information from other routers via a routing protocol (RIP, OSPF, EIGRP etc.) • Note the following are non-routable: – private networks: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16 – Loopback 127.0.0.0/24 21
  • 22. ICMP Purpose (RFC 792) • Communicates control & error information – Between routers and hosts – Only reports to original source, suggests corrections – Error messages about error messages are not generated – Never generated due to multicasts • Packet format 0 8 16 24 31 Type Code Checksum ICMP data (depends on type/code) 22
  • 23. Main ICMP request types Type ICMP 0 Echo reply, ping 3 Destination unreachable (code 1 host, code 3 port) DF and must fragment (code 4) 4 Source quench 5 Redirect (change a route) 8 Echo request 11 Time exceeded (code 0 ttl=0, code 1 reassembly) 12 Parameter problems 23
  • 24. ICMP Echo/Ping • Very commonly used diagnostic tool • Implementations vary between OS’ • Build echo request 0 8 16 24 31 Type=8 Code=0 Checksum Identifier Sequence number Optional data – Identifier used to match request to replies (e.g. pid) – Sequence number, starts at 0 increments by 1 for each ping packet • Used to detect loss, reorder, duplicates – Optional data, sent by requester, returned by replier • Usually contains a timestamp when the request was sent plus pad data 24
  • 25. What do we learn from Ping • Host reachable – Host may respond to ping but not be running services • Round trip timing • Lost packets • Packet reordering duplicate packets • Example: 13cottrell@noric05:~>ping -c 4 lhr.comsats.net.pk PING lhr.comsats.net.pk (210.56.16.10) from 134.79.125.205 : 56(84) bytes of data. 64 bytes from lhr.comsats.net.pk (210.56.16.10): icmp_seq=0 ttl=242 time=716.962 msec 64 bytes from lhr.comsats.net.pk (210.56.16.10): icmp_seq=1 ttl=242 time=720.375 msec 64 bytes from lhr.comsats.net.pk (210.56.16.10): icmp_seq=2 ttl=242 time=725.907 msec 64 bytes from lhr.comsats.net.pk (210.56.16.10): icmp_seq=3 ttl=242 time=710.734 msec --- lhr.comsats.net.pk ping statistics --- 4 packets transmitted, 4 packets received, 0% packet loss round-trip min/avg/max/mdev = 710.734/718.494/725.907/5.566 ms 25
  • 26. Unreachable 76cottrell@flora06:~>ping islamabad-server2.comsats.net.pk ICMP 13 Unreachable from gateway 207.45.205.18 for icmp from FLORA06.SLAC.Stanford.EDU (134.79.16.101) to islamabad-server2.comsats.net.pk (210.56.8.8) What does this mean, see exercise? 26
  • 27. Time Exceeded 0 8 16 24 31 Type 11 Code Checksum Unused Internet header & 8 bytes of data • Time-to-live has expired at a router (code=0) – ttl sets bound on number routers datagram can transit • Prevents infinite routine loops • Initialized by sender, decremented by 1 each time passes router • When ttl = 0 datagram thrown away & sender notified by ICMP message • Fragment reassembly timer (code=1) 27
  • 28. MTU Discovery • Path MTUs vary • Fragmentation is bad • Small transmission units are bad • SO need to discover optimum MTU (largest without fragmentation) • Host sends a packet with the Don’t Fragment bit set – Length is lesser of local MTU and MSS announced by remote system – If MTU between hosts requires fragmentation (e.g. at an intermediate router), then • if an ICMP DF bit set & must fragment then an ICMP message is sent back to source, saying “I can’t fragment” • try again with smaller size. 28
  • 29. User Datagram Protocol - UDP • RFC 768, Protocol 17 App. Port 1 Port 2 Port 1 Port 2 Demux on Port number Transport TCP UDP Demux on Network IP IP protocol • Provides unreliable, connectionless on top of IP • Minimal overhead, high performance – No setup/teardown, 1 datagram at a time • Application responsible for reliability – Includes datagram loss, duplication, delay, out-of- sequence, multiplexing, loss of connectivity 29
  • 30. UDP Datagram format 0 8 16 24 31 Source port Destination port UDP message len Checksum (opt.) Data … • Source/destination port: port numbers identify sending & receiving processes – Port number & IP address allow any application in any computer on Internet to be uniquely identified – Used to demultiplex datagrams to processes – Ports can be static or dynamic • Static (< 1024) assigned centrally, known as well known ports • Dynamic • Message length in bytes includes the UDP header and data 30
  • 31. UDP applications • Message oriented, e.g. SNMP, DNS, time • File system, e.g. NFS, AFS • Lightweight file transfer, e.g. tftp, bootp 31
  • 32. Transmission Control Protocol -TCP • RFC 768 & host requirements RFC 1122 – Reliable stream transport • Connection oriented (full duplex virtual circuit) – Conceptually place call, two ends communicate to agree on details – After agreeing application notified of connection – During transfer, ends communicate continuously to verify data received correctly – When done, ends tear down the connection – If UDP is like regular mail, TCP is like phone call • Provides buffering and flow control • Takes care of lost packets, out of order, duplicates, long delays • Isolates application program from network details • Jargon – Segment = TCP packet – Socket= source (address + port) + destination (address + port) 32
  • 33. TCP layering App. Port 1 Port 2 Port 1 Port 2 Demux on Transport TCP UDP Port number IP port 6 Demux on Network IP IP protocol • To ID connection need: – Source: (address, port) AND Destination: (address, port) – Only need one port on host to allow multiple connections, since each connection will have different (host, port) at other end • E.g. single host can serve multiple telnet connections • Passive open: application contacts OS & indicates will accept incoming connection, OS assigns port and listens • Active open: application requests OS to connect to an (host, port) 33
  • 34. TCP – providing reliability • Positive acknowledgement (ACK) with retransmission – Sender keeps record of each packet sent – Sender awaits an ACK – Sender starts timer when sends packet Sender site Receiver site Send pkt 1 Rcv pkt 1 Time Send ACK 1 Rcv ACK 1 Send pkt 2 Rcv pkt 2 Send ACK 2 Rcv ACK 2 Network messages 34
  • 35. TCP – simple lost packet recovery Sender site Receiver site Send pkt 1 Loss Start timer Pkt should arrive ACK normally ACK should be sent arrives Timer expires Retransmit pkt 1 start timer Rcv pkt 1 Send ACK 1 Rcv ACK 1 Network messages 35
  • 36. TCP – improving performance • BUT simple ACK protocol wastes bandwidth since it must delay sending next packet until it gets ACK • Use sliding window Initial window of 4 packets Window slides 1 2 3 4 5 6 7 8 … 1 2 3 4 5 6 7 8 … Packets successfully sent Packets to be sent Packets sent, awaiting ACK • Sender can send 4 packets of data without ACK – When sender gets ACK then can send another packet – Window = unacknowledged packets/bytes 36
  • 37. Tuning to fill pipe • Optimal window size depends on: – Bandwidth end to end, i.e. min(BWlinks) AKA bottleneck bandwidth – Round Trip Time (RTT) – For TCP keep pipe full • Window (sometime called pipe) ~ RTT*BW – Can increase bandwidth by Src Rcv orders of magnitude • Windows also used for flow control t = bits in packet/link speed ACK RTT 37
  • 38. Implementation • Sliding window operates at byte level, NOT packet Current window 1 2 3 4 5 6 7 8 … Highest byte that can be sent 3 pointers Highest byte sent Bytes sent and acknowledged • Receiver keeps similar window to put stream back together • Since full duplex, altogether 4 windows & pointer sets 38
  • 39. TCP flow control • Windows vary over time – Receiver advertises (in ACKs) how many it can receive • Based on buffers etc. available – Sender adjusts its window to match advertisement – If receiver buffers fill, it sends smaller adverts • Used to match buffer requirements of receiver • Also used to address congestion control (e.g. in intermediate routers) 39
  • 40. TCP Segment format 0 4 8 10 16 24 31 Source port Destination port Sequence number Acknowledgement number Hlen Resv Code Window Checksum Urgent ptr Options (if any) Padding Data if any … • Source/Dest port: TCP port numbers to ID applications at both ends of connection • Sequence number: ID position in sender’s byte 40 stream
  • 41. TCP segment format – cont. • Acknowledgement: identifies the number of the byte the sender of this segment expects to receive next • Hlen: specifies the length of the segment header in 32 bit multiples. If there are no options, the Hlen = 5 (20 bytes) • Reserved for future use, set to 0 • Code: used to determine segment purpose, e.g. SYN, ACK, FIN, URG 41
  • 42. TCP Segment format- cont • Window: Advertises how much data this station is willing to accept. Can depend on buffer space remaining. • Checksum: Verifies the integrity of the TCP header and data. It is mandatory. • Urgent pointer: used with the URG flag to indicate where the urgent data starts in the data stream. Typically used with a file transfer abort during FTP or when pressing an interrupt key in telnet. • Options: used for window scaling, SACK, timestamps, maximum segment size etc. 42
  • 43. TCP timeout • Need a timeout estimate that will work for LANs (RTT < msec.) to satellite WANs (hundreds of msec. to secs). RTT can vary a lot with time of day, day of week, or one second to next. May 12th RTT ms. – TCP records time segment sent – and time ACK received – Then calculates RTT sample – Smooth & use to estimate timeout, e.g. Time of day • Timeout=beta * RTTs • Timeout= RTTs + eta{=4}*f(dev(RTTs)) – Needs to take account of losses, e.g. • New_timeout=gamma{2} * timeout 43
  • 44. TCP connection establishment • 3 way handshake Site 1 Active Site 2 Send SYN seq x Win Rcv SYN segment 4096, mss 1024 Passive Send SYN seq=y, ACK x+1 Rcv SYN/ACK Win 40 96, mss 1024 Send ACK y+1 Rcv ACK segment • Initial sequence numbers (x, y) are chosen randomly • Guarantees both sides ready & know it, and sets initial sequence numbers, also sets window & mss • Once connection established, data can flow in both directions, equally well, there is no master or slave 44
  • 45. TCP close connection • Modified 3 way handshake (or 4 way termination) Site 1 Site 2 (App closes) Send FIN seq=x Rcv FIN segment Rcv ACK segment Send ACK x=1 (inform app) (app closes connection) Rcv FIN + ACK seg Send FIN seq=y, ACK x+1 Send ACK y+1 Receive ACK segment • App tells TCP to close, TCP sends remaining data & waits for ACK, then sends FIN • Site 2 TCP ACKs FIN, tells its application “end of data” • Site 2 sends FIN when its app closes connection (may be long delay (e.g. require human interaction). 45
  • 46. More Information • Lectures, tutorials etc: – www.nv.cc.va.us/home/joney/tcp_ip.htm – www.cs.pdx.edu/~jrb/tcpip.lectures.html – www.raleigh.ibm.com/cgi-bin/bookmgr/BOOKS/EZ306200/CCONTENTS – www.cisco.com/univercd/cc/td/doc/product/iaabu/centri4/user/scf4ap1.htm – www.cis.ohio-state.edu/htbin/rfc/rfc1180.html – www.jbmelectronics.com/tcp.htm • Encylopaedia – http://www.freesoft.org/CIE/index.htm • TCP/IP Resources – www.private.org.il/tcpip_rl.html • Understanding IP addresses – http://www.3com.com/solutions/en_US/ncs/501302.html • Configuring TCP (RFC 1122) – ftp://nic.merit.edu/internet/documents/rfc/rfc1122.txt • Assigned protocols, ports etc (RFC 1010) – http://www.es.net/pub/rfcs/rfc1010.txt & /etc/protocols 46
  • 47. Example: 3 way handshake • atlas> telnet sunstats.cern.ch – atlas is a WNT PC, sunstats is a Sun Solaris 5.6 host – MSS is set in TCP option in a SYN segment, communicates the MSS the sender wants to receive – len=ip_hlen/tcp_hlen:ip_total_len – Initial Sequence Numbers are randomly selected – Telnet = port 23 – W=Receive window size advertises how much data this host will accept 47
  • 48. Example: 3 way handshake - cont. • TCP from atlas:1174 to sunstats:23 seq=180839, A=0, W=8192, SYN [len=5/6:44, opt=020405B4 <opt=2, len=4, mss=0x5B4=1460>] • TCP from sunstats:23 to atlas:1174 seq=1383568304, A=180840, W=64240, SYN/ACK [len=5/6:44, opt=020405B4] • TCP from atlas:1174 to sunstats:23 seq =180840, A=1383568305, W=8760 [len=5/5:40, opt=nul] – Notice window size can vary from segment to segment depending on buffer space available – Notice smaller PC window advertisement – Notice ephemeral port selected by telnet client – Notice acknowledge next expected byte (=seq+1) – 0x020405B4: 02 = option type, 04=len, 0x5B4=1460 48
  • 49. Session start SLAC>CERN: 256kbyte window,1 stream, full speed > 30msec, 13MBytes in 20s, 5.1MBytes/s Congestion window Rcvr Advertised window Segments sent Acks returned by Rcvr 49

Notes de l'éditeur

  1. How do we measure the QoS Introduction to PingER and active end-to-end measurement methodology Problem areas illustrated by results from PingER: Generally, e.g. S. America, Spain, China, Germany to .edu &amp; .ca How do E. Europe &amp; Russia look? How does performance affect applications Validating ping measurements and impact on FTP &amp; Web performance Overview of impact of performance on applications including email, web, FTP, interactive apps Detailed look at bulk data transfer expectations for HENP sites Detailed look at critical performance metrics (RTT, loss, jitter, availability) and impact on VoIP What can be done to improve QoS: More bandwidth Reserved bandwidth Differentiated services
  2. Need routing to get message back to origin
  3. The address range from 0.0.0.0 through 0.255.255.255 should not be considered part of the normal Class A range. 0.x.x.x addresses serve no particular function in IP, but nodes attempting to use them will be unable to communicate properly on the Internet. 127.0.0.1 loopback test mechanism of network adapters. Messages sent to 127.0.0.1 do not get delivered to the network. Instead, the adapter intercepts all loopback messages and returns them to the sending application. IP applications often use this feature to test the behavior of their network interface. 127.0.0.0 through 127.255.255.255 reserved for loopback, 224-239 are used for multicast (see http://www.firewall.cx/multicast-intro.php, also Google IGMP &amp; PIM)) The range of addresses between 224.0.0.0 and 224.0.0.255, inclusive, is reserved for the use of routing protocols and other low-level topology discovery or maintenance protocols, such as gateway discovery and group membership reporting. Multicast routers should not forward any multicast datagram with destination addresses in this range, regardless of its TTL. 255.0.0.0 through 255.255.255.255.255 reserved for IP broadcast
  4. Class B addresses require demonstrated need: subnetting plan for &gt; 32 subnets, &gt; 4096 hosts 192-193 Multiregional 194-195 Europe 196-197 Others 198-199 N. America 200-201 Central/South America 202-203 Pacific Rim 204-207 Reserved APNIC=Asia pacific Network Information Center ARIN = American Registry for Internet Numbers RIPE NCC = Reseau IP Europeens
  5. Must adhere to: Cannot be referenced by hosts in another organization Cannot be defined to any external router Cannot be advertised addresses, and cannot forward IP datagrams containing those address to external routers External routers will quietly discard all routing information regarding these addresses. Multicast is in the class D range 224.0.0.0 to 239.255.255.255 or 224.0.0.0/4
  6. Removes the address classes A, B, C boundaries. These are called Classfull networks
  7. Why restrict communication to original source: datagram only contains original source &amp; ultimate destination, does not contain complete travel itinerary of route taken. Since routing dynamic cannot know path has or will use.
  8. Use the Internet to find out what PING stands for.
  9. Low overhead since: no set up or tear down, deals with only one datagram at a time