The document describes the development of a high performance email handling system in Ruby. The initial system built with a Java tool and proprietary Message Transfer Agent was unable to scale and handle peak loads. A new system was developed using RabbitMQ for queues, Redis for storage, and parallel processing in Ruby. It was able to send over 1 million emails per hour and proved more scalable and reliable than the previous system.
18. Sent emails statistics
• 1 email produce 2 updates
• 1_000_000 emails produce 2_000_000 updates
• Definitely kills Database
Wednesday, October 26, 2011
19. Aggregate counters
Increment key:
"#{email_campaign_id}_#{email_queue_id}"
Wednesday, October 26, 2011
20. Use aggregated counts
Extract key:
"#{email_campaign_id}_#{email_queue_id}"
Update counters for:
email_campaign_id
email_queue_id
Wednesday, October 26, 2011
21. Less updates
1_000_000 emails produce 8_000 updates
~250x less updates
Wednesday, October 26, 2011
22. Key Value store
Tokyo Tyrant
as persistent storage for precaching
Wednesday, October 26, 2011
23. Coding
2 weeks of pair programming
Wednesday, October 26, 2011
24. And the show begins...
Wednesday, October 26, 2011
25. Test experiment
Benchmark:
• Test throughput
Stress test:
•Test high load handling
Wednesday, October 26, 2011
26. Count Size(Kbytes)
Test data 151 | 15
120 | 68
109 | 30
13 | 61
8 |116
Select proper size of email 68Kb
to have representative 30Kb
results
61Kb
15Kb
116Kb
Average email length is 30Kb Email length clustering
Selected email length is 60Kb
Wednesday, October 26, 2011
27. Testing
throughput
high load handling
Wednesday, October 26, 2011
28. Issues
• Tokyo Tyrant fails under concurrent key updates
• RabbitMQ fails on 500_000 messages in queue
Wednesday, October 26, 2011
29. Solution
• Redis instead of Tokyo Tyrant
• RabbitMQ
- Queue limited to 300_000 messages
- AMQP
- Disable routing from broker
Wednesday, October 26, 2011
30. Testing
throughput
high load handling
Wednesday, October 26, 2011
31. Optimize Code
String gsub => gsub!
String sub => sub!
String += => <<
Array map => map!
...
C’mon you know all this, right?
Wednesday, October 26, 2011
32. Optimize Code
extra.each do |field, value| body.gsub!(/__(.*?)__/) do |match|
body.gsub!("__#{field}__", value) extra[$1.to_sym]||''
end end
body.gsub!(/__([0-9a-z_-]+)__/, '')
100000 messages @ 11.8556471347809 100000 messages @ 4.43633253574371
2.6 times faster
Wednesday, October 26, 2011
33. Testing
throughput
high load handling
Wednesday, October 26, 2011
34. You should see the light
at the end of tunnel
Wednesday, October 26, 2011
35. SMTP is slow
Synchronous execution
S: 220 smtp.example.com ESMTP Postfix
C: HELO relay.example.org
S: 250 Hello relay.example.org, I am glad to meet you
C: MAIL FROM:<bob@example.org>
S: 250 Ok
C: RCPT TO:<alice@example.com>
S: 250 Ok
C: DATA
S: 354 End data with <CR><LF>.<CR><LF>
C: From: "Bob Example" <bob@example.org>
C: To: "Alice Example" <alice@example.com>
C: Cc: theboss@example.com
C: Date: Tue, 15 Jan 2008 16:02:43 -0500
C: Subject: Test message
C:
C: Hello Alice.
C: This is a test message with 5 header fields and 4 lines in the message body.
C: Your friend,
C: Bob
C: .
S: 250 Ok: queued as 12345
C: QUIT
S: 221 Bye
{The server closes the connection}
Wednesday, October 26, 2011
36. ECStream
Internal protocol of our Momentum MTA
“Just send C structure to the socket”
Wrote C native extension using FFI
5-10x faster than SMTP
Wednesday, October 26, 2011
37. Testing
throughput
high load handling
Wednesday, October 26, 2011
42. Got issues
We got claims about broken emails
http://eimages.ratepoint.com => http://eimagesratepoint.com
•Switch to previous system
•Figure out the issue
•The problem is at the Email Providers
Wednesday, October 26, 2011
43. back to slow SMTP
CPU is bottle neck
Wednesday, October 26, 2011
44. back to slow SMTP
CPU is bottle neck
Optimize more
Wednesday, October 26, 2011
45. back to slow SMTP
CPU is bottle neck
Optimize more
Stop
Wednesday, October 26, 2011
46. back to slow SMTP
CPU is bottle neck
Optimize more
Stop
Change point of view
Wednesday, October 26, 2011
47. back to slow SMTP
CPU is bottle neck
Optimize more
Stop
Change point of view
Change servers to have
more CPU and less RAM
Wednesday, October 26, 2011
48. back to slow SMTP
CPU is bottle neck
Optimize more
Stop
Change point of view
Change servers to have
more CPU and less RAM
Wednesday, October 26, 2011
49. Testing
throughput
high load handling
Wednesday, October 26, 2011
51. Monitoring
• Scout app
- process usage plugin
- redis monitoring
- server overview
more at: https://github.com/railsware/scout-app-plugins
• Home made realtime monitor
Wednesday, October 26, 2011