2. Based on a True Story
• NOT AN AD!
• Qrator: distributed network
● Custom TCP/IP at the bottom
● Custom management protocol at the top
● Interacting with plenty of Web servers and Web browsers
on a daily basis
● 2 years of continuous debug^W Product ImprovementTM
2
4. Issue #1
• Message delivery is unreliable in TCP: there's no
estimation on when (and if) the message will arrive
at all
• Timeouts!
• Limit all resources, including time
• No action is itself an action
5. Timeouts
• Between recvfrom()
• Between requests
• Request timeout
• Lifetime of a session
• Lifetime of %OBJECTNAME%
• Long polling may be a bad idea
6. Ex. 1
• Slowloris (Apache): DoS
●
(not distributed, just denial of service)
• Slow HTTP POST
●
Apache, IIS, Lighttpd: DoS
●
Nginx: DDoS with a botnet
7. Ex. 2
12 rpm AJAX page update
● Backup script switched the server off
8.
9. Content-Length
– Limit resources for all actions
– Custom protocol should define limits on the
input length
10. errno(3)
– The connection may be closed for no good
reason
– Check errno after recvfrom(), sendto(), etc.
● ENOMEM
● ECONNRESET
● EANYTHING
11. Ex. 3
● Internet Explorer: ECONNRESET means
successful connection termination
– Download status is being ignored
– Content-Length is being ignored
17. Optimization
– Text-based protocols are convenient to debug
●
And you will debug
– Maybe even in production
– Making use of binary protocols is often a
premature optimization
●
BSON, Google Protocol Buffers
18. Optimization
●
TCP socket options:
– TCP_NODELAY: disables Nagle's algorithm
●
Speedup with small portions of data
– TCP_CORK (Linux): multiple portions of data
in a single TCP segment
– "socket corking"