Network latency - measurement and improvement

Network Latency
Measurement and improvement

What is latency?
The time delay

What is latency?
The time delay
between cause

What is latency?
The time delay
between cause and effect

What is latency?
• Latency impacts the user experience
• Lower latency = more responsive = better
experience
• A fast download over link of high latency can take
longer than a slow down load over a low latency
link

Why measure latency?
• Efficiency:
• Improved resource usage
• Improved user experience
• Spotting and diagnosing defects

Where is Latency?
• Between:
• A CPU and it’s cache
• Client and server over a network
• Application and disk
• Anywhere a system does work

Where is latency?
• L1 cache reference 0.5 ns
• Branch mispredict 5 ns
• L2 cache reference 7 ns
• Mutex lock/unlock 100 ns
• Main memory reference 100 ns
• Compress 1K bytes with Zippy 10,000 ns
• Send 2K bytes over 1 Gbps network 20,000 ns
• Read 1 MB sequentially from memory 250,000 ns
• Round trip within same datacenter 500,000 ns
• Disk seek 10,000,000 ns
• Read 1 MB sequentially from network 10,000,000 ns
• Read 1 MB sequentially from disk 30,000,000 ns
• Send packet CA->Netherlands->CA 150,000,000 ns

Causes of network latency
• Physical limitations - speed of light, wire speeds
• Congestion at switches, routers and servers
• Packet loss due to noise, congestion, faults

Round Trip Times
• aka RTT
• Time to go their and back again
• Return route my be different from the outbound

Network Latency Tools
• Ping. Time between sending ICMP Echo Request and
receiving ICMP Echo Reply
• Traceroute. Time between sending a packet with
incremented TTL value and receiving ICMP Time
Exceeded package..
• tcptraceroute. traceroute using TCP packages to
configurable ports
• mtr - does ICMP, UDP and TCP traceroute

Transmission Control
Protocol (TCP)
• Reliable connections, with retransmission

Transmission Control
Protocol
• Stateful, connection oriented protocol for reliable
data transmission
• Guarantees data delivery and ordering
• Server maintain state tables of connections
• HTTP, SMTP, SSL/TLS, IRC, SSH…

TCP
• Three way handshake. 1.5 roundtrips to set up
connection

TCP Latency Improvements
• By reducing number of round trips:
• Compress content into fewer packets. 1500 MTU
=1460 byte payload
• TCP timestamps take an extra 12 bytes = 1448
byte payload. Timestamp can be disabled.

TCP Improvements
• Move your content closer to your users:
• Make good use of local caches (e.g. browser)
• Content Delivery Networks (Cloudflare,
Cloudfront, Akamai)
• Host geographically closely
• Host at locations with low latency links

HTTP Latency
• Use HTTP/1.1, HTTP/2 (née SPDY)
• Ensure pipelining is enabled
• Tune TCP keep alive
• Try TCP corking (buffer stream and
send), nodelay (buffer small
payload

HTTP Latency
• Take care over caching and provide well formed
headers
• Use tools like Pagespeed Insight to analyse
performance
• Pagespeed module to modify content on the
server

SSL/TLS
• Use AES and compatible libraries on processors
with AES-NI for hardware acceleration
• Elliptic Curve (EC-DSA) for smaller certs & keys
and better performance.
• Terminate SSL at the edge and consider using
lightweight or no encryption inside the local
network.

User Datagram Protocol
• ‘Fire and forget’ - no inbuilt reliability, connection-
less
• No hand shake
• Ordering and retransmission at the application
level
• Stateless, so no connect states to manage
• DNS, VOIP, SNMP, RIP, VPNs, Games, Mosh

Domain Name Service
• DNS lookups can hamper user experience
significantly
• Synchronous lookup before each resource
access
• Uses UDP (usually) for client/server lookups

DNS
• Caches are distributed nearer to the user (DNS
resolvers/forwarders)
• Great for popular sites
• For lower traffic site may still require an
authoritative lookup

DNS CNAMES
• DNS CNAMEs - name -> name -> IP
• Two DNS lookups. Two round trips.
• Never use a CNAME at a zone apex if you have
other records in that zone.

DNS Time to Live
• Time a DNS record is cached in a non-
authoritative servers.
• Need to strike a balance between keeping the
record cached near the user and the ability to
update the record
• 1 day is a good starting point. Decrease before
record switch overs.

DNS clients
• Avoid synchronous DNS lookups where possible:
async libraries, or batch process results later
• Consider local hosts files, use config
management to distribute

DNS
• Keep DNS geographically close to users
• Use providers with anycast DNS servers
• Globally distribute records if the audience is
global
• Can make initial load significantly faster

QUIC
• Experimental protocol from Google for encrypted,
multiplexed streams over UDP
• Aims to reduce number of round trips
• May make the next TLS standard
• Supported by Chrome, prototype server

Client and Servers hosts
• Watch for queuing - something in a queue means
not enough resource to service the request
• Disk IO historically a problem. Throughput in
IOPS. SSDs are reducing this latency.
• Be familiar with the standard system monitoring
tools
• Be wary of multi-threaded processes and locks

Cloud
• Get familiar with cloud providers tools. Useful views
outside the hosts.
• Load test for 5+ cycles of monitoring
• Can provide protocol level information
• Test apps from the point of view of the users -
Nagios, Pingdom, hitting representative end points
• Don’t take their word for performance - measure it

Network latency - measurement and improvement

Network latency - measurement and improvement

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (19)

Similaire à Network latency - measurement and improvement

Similaire à Network latency - measurement and improvement (20)

Dernier

Dernier (20)

Network latency - measurement and improvement

Notes de l'éditeur