AWS re:Invent 2016: Making Every Packet Count (NET404)

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Mike Furr, Principal Engineer, EC2 Networking
December 2, 2016
Making Every Packet Count
NET404

What to Expect from this Session
Tuning TCP
on Linux
TCP Performance Application

What to Expect from this Session
Application
Watch us increase
network
performance
137%

TCP
• Transmission Control Protocol
• Underlies SSH, HTTP, *SQL, SMTP
• Stream delivery, flow control

Limiting in-flight data
Jack Jill
Receive
Window
Receive
Window
Congestion
Window
Congestion
Window
Round trip time

Bandwidth delay product
Jack Jill
2 ms round-trip time

Bandwidth delay product
Jack Jill
100 ms round-trip time

Receive window
• Receiver controlled, signaled to sender

Congestion window
Jack Jill
Receive
Window
Receive
Window
Congestion
Window
Congestion
Window
Round trip time

Congestion window
• Sender controlled
• Window is managed by the congestion control algorithm
• Inputs – varies by algorithm


Initial congestion window
$ ip route list
default via 10.16.16.1 dev eth0
10.16.16.0/24 dev eth0 proto kernel scope link
169.254.169.254 dev eth0 scope link
1448 1448 1448 = 4344 bytes

Initial congestion window
# ip route change 10.16.16.0/24 dev eth0
proto kernel scope link initcwnd 16
$ ip route list
initcwnd 16
1448 1448 1448 1448[ + 12 ]
= 23168 bytes

0
20
40
60
80
100
0% 2% 4% 6% 8% 10%
Loss Rate
Impact of loss on TCP throughput

Loss is visible as TCP retransmissions
$ netstat -s | grep retransmit
58496 segments retransmitted
52788 fast retransmits
135 forward retransmits
3659 retransmits in slow start
392 SACK retransmits failed

Socket level diagnostic
$ ss -ite
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 3829960 10.16.16.18:https 10.16.16.75:52008
timer:(on,012ms,0) uid:498 ino:7116021 sk:0001c286 <->
ts sack cubic wscale:7,7 rto:204 rtt:1.423/0.14 ato:40
mss:1448 cwnd:138 ssthresh:80 send 1123.4Mbps unacked:138
retrans:0/11737 rcv_space:26847
TCP State

Bytes queued for
transmission
$ ss -ite
ESTAB 0 3829960 10.16.16.18:https 10.16.16.75:52008

Congestion control
algorithm
$ ss -ite
ESTAB 0 3829960 10.16.16.18:https 10.16.16.75:52008

Retransmission
timeout
$ ss -ite
ESTAB 0 3829960 10.16.16.18:https 10.16.16.75:52008

Congestion
window
$ ss -ite
ESTAB 0 3829960 10.16.16.18:https 10.16.16.75:52008

Retransmissions
$ ss -ite
ESTAB 0 3829960 10.16.16.18:https 10.16.16.75:52008

Monitoring retransmissions in real time
Observable using Linux kernel tracing
# tcpretrans
TIME PID LADDR:LPORT -- RADDR:RPORT STATE
03:31:07 106588 10.16.16.18:443 R> 10.16.16.75:52291 ESTABLISHED
https://github.com/brendangregg/perf-tools/

Congestion control algorithm
Jack Jill

Congestion control algorithms in Linux
• New Reno: Pre-2.6.8
• BIC: 2.6.8 – 2.6.18
• CUBIC: 2.6.19+
• Pluggable architecture
• Other algorithms often available
• Vegas, Illinois, Westwood, Highspeed, Scalable

Tuning congestion control algorithm
$ sysctl net.ipv4.tcp_available_congestion_control
net.ipv4.tcp_available_congestion_control = cubic reno
$ find /lib/modules -name tcp_*
[…]
# modprobe tcp_illinois
$ sysctl net.ipv4.tcp_available_congestion_control
net.ipv4.tcp_available_congestion_control = cubic reno illinois

Tuning congestion control algorithm
# sysctl net.ipv4.tcp_congestion_control=illinois
net.ipv4.tcp_congestion_control = illinois
# echo “net.ipv4.tcp_congestion_control = illinois” >
/etc/sysctl.d/01-tcp.conf
[Restart network processes]

Retransmission timer
• Input to when the congestion control
algorithm considers a packet lost
• Too low: spurious retransmission; congestion control can over-react
and be slow to re-open the congestion window
• Too high: increased latency while algorithm determines a packet is
lost and retransmits

Tuning retransmission timer minimum
• Default minimum: 200ms
# ip route list
Route to other
instances in our
subnet (same
AZ)

Tuning retransmission timer minimum
# ip route list
# ip route change 10.16.16.0/24 dev eth0 proto kernel
scope link rto_min 10ms
# ip route list
10.16.16.0/24 dev eth0 proto kernel scope link rto_min
lock 10ms

Queueing along the network path
Jack Jill

Queueing along the network path
• Intermediate routers along a path have
interface buffers
• High load leads to more packets in buffer
• Latency increases due to queue time
• Can trigger retransmission timeouts

Active queue management
$ tc qdisc list
qdisc mq 0: dev eth0 root
qdisc pfifo_fast 0: dev eth0 parent :1 bands 3 […]
qdisc pfifo_fast 0: dev eth0 parent :2 bands 3 […]
# tc qdisc add dev eth0 root fq_codel
qdisc fq_codel 8006: dev eth0 root refcnt 9 limit 10240p
flows 1024 quantum 9015 target 5.0ms interval 100.0ms ecn
http://www.bufferbloat.net/projects/codel/wiki

Maximum transmission unit
• 3.47% overhead vs. 0.58% overhead
• Improvement seen among instances in your VPC
1448B
Payload
8949B Payload

Tuning maximum transmission unit
# ip link list
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc
mq state UP mode DEFAULT group default qlen 1000
link/ether 06:f1:b7:e1:3b:e7
# ip route list

Tuning maximum transmission unit
# ip route change default via 10.16.16.1 dev eth0 mtu 1500
# ip route list
default via 10.16.16.1 dev eth0 mtu 1500

Amazon EC2 enhanced networking
Jack Jill

EC2 enhanced networking
• Higher I/O (packets per second) performance
• Lower CPU utilization
• Lower inter-instance latency
• Low network jitter
• Instance families: M4,C4,C3,R3,P2,X1,I2,D2 (w/ HVM)
• Drivers built into Windows, Amazon Linux AMIs
• Questions? re:Invent 2014 – SDD419

Test setup
• m4.10xlarge instances – Jack and Jill
• Amazon Linux 2015.09 (Kernel 4.1.7-15.13.amzn1)
• Web Server: nginx 1.8.0
• Client: ApacheBench 2.3
• TLSv1,ECDHE-RSA-AES256-SHA,2048,256
• Transferring uncompressible data (random bits)
• Origin data stored in tmpfs (RAM based; no server disk I/O)
• Data discarded once retrieved (no client disk I/O)

Example Apache Bench output
[ … ]
Concurrency Level: 100
Time taken for tests: 59.404 seconds
Complete requests: 10000
Failed requests: 0
Write errors: 0
Total transferred: 104900000 bytes
HTML transferred: 102400000 bytes
Requests per second: 168.34 [#/sec] (mean)
Time per request: 594.038 [ms] (mean)
Time per request: 5.940 [ms] (mean, across all
concurrent requests)
Transfer rate: 1724.49 [Kbytes/sec] received
[ … ]

Application 1
HTTPS with intermediate network loss
Jack Jill
0.2%
loss

Test setup
• 1 test server instance, 1 test client instance
• 80 ms RTT
• 160 parallel clients retrieving a 100 MB object 5 times
$ ab -n 100 -c 20 https://server/100m [* 8]
• Simulated packet loss
# tc qdisc add dev eth0 root netem loss 0.2%
• Goal: Minimize throughput impact with 0.2% loss

Results – application 1
Test Bandwidth Mean Time
All defaults – no loss 4163 Mbps 27.9s
All defaults – 0.2% simulated loss 1469 Mbps 71.8s
Increased initial congestion window w/ loss 1328 Mbps 80.6s
Doubled server-side TCP buffers w/ loss 1366 Mbps 78.6s
Illinois congestion control algorithm w/ loss 3486 Mbps 28.2s
137% increase in
performance!

Application 2
Bulk data transfer; high RTT path
Jack Jill

Test setup
• 80 ms RTT
• 8 parallel clients retrieving a 1 GB object 2 times
$ ab -n 2 -c 1 https://server/1g [* 8]
• Goal: Maximize the throughput / minimize transfer time

All defaults 2164 Mbps 30.4s
Doubled TCP buffers on server end 1780 Mbps 37.4s
Doubled TCP buffers on client end 2462 Mbps 27.6s
Active queue management on server 2249 Mbps 29.3s
Client buffers + AQM 2730 Mbps 24.5s
Illinois CC + client buffers + AQM 2847 Mbps 23.0s
Illinois CC + server & client buffers + AQM 2865 Mbps 23.5s
32% increase in
performance!

Application 3
Bulk data transfer; low RTT path
Jack Jill

Test setup
• 1.2 ms RTT
• 8 parallel clients retrieving a 10 GB object 2 times
• $ ab -n 2 -c 1 https://server/10g [* 8]
• Start at Internet default MTU, then increase
Goal: Maximize the throughput / minimize transfer time

Results
All defaults + 1500B MTU 8866 Mbps 74.0s
9001B MTU 9316 Mbps 70.4s
Active Queue Management (+MTU) 9316 Mbps 70.4s
5% increase

Application 4
High transaction rate HTTP service
Jack Jill

Test setup
• 80 ms RTT
• HTTP, not HTTPS
• 6400 parallel clients retrieving a 10k object 100 times
• $ ab -n 20000 -c 200 http://server/10k [* 32]
Goal: Minimize latency

All defaults 2580 Mbps 195.3ms
Initial congestion window – 16 packets 2691 Mbps 189.2ms
Illinois CC + initial congestion window 2649 Mbps 186.2ms
4.6% decrease

Takeaways
• The network doesn’t have to be a black box – Linux tools can be
used to interrogate and understand
• Simple tweaks to settings can dramatically increase performance –
test, measure, change
• Understand what your application needs from the network, and
tune accordingly

Remember to complete your
evaluations!

AWS re:Invent 2016: Making Every Packet Count (NET404)

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à AWS re:Invent 2016: Making Every Packet Count (NET404)

Similaire à AWS re:Invent 2016: Making Every Packet Count (NET404) (20)

Plus de Amazon Web Services

Plus de Amazon Web Services (20)

Dernier

Dernier (20)

AWS re:Invent 2016: Making Every Packet Count (NET404)