The document discusses the challenges of implementing a reply-via-email service on the oDesk platform. It describes how messages on the oDesk Message Center would be delivered to users' emails, and how users could reply directly from their emails to post back on the Message Center. However, implementing this system led to problems with heavy loads, spam filtering, and ensuring replies were properly tracked. The document explores various technical approaches taken to address these issues over several years of operating the service.
3. OK, let’s try
To be, or not to be, that is the Question:
Whether ’tis Nobler in the mind to ſuffer
The Slings and Arrows of outragious Fortune,
Or to take Armes againſt a Sea of troubles,
And by opposing end them: to dye, to ſleepe
No more; and by a sleep, to say we end
The Heart-ake, and the thouſand Naturall ſhockes
That Flesh is there too? "Tis a consummation
Deuoutly to be d. To dye to sleepe To sleep, perchance to
Dream; I, there's the rub,
For in that sleep of death,dreams may come,
4. No. That’s not works.
• What’s works here:
– Slides in English
– Speech in Russian
12. Reply-via-Email overview
• Alice writes to Bob a message via MC
• Message is delivered to Bob's MC
• Message is delivered to Bob's E-Mail
• Bob checks E-Mail and replies to it
• Bob's mail comes to oDesk mailgate
• oDesk decodes E-Mail and put Bob's reply into MC
the same way as if it was written via web-interface
13. Postfix mail server configuration
• All incoming E-Mails to odesk.com domain
are routed by Postfix mail server
• Let’s create reply.odesk.com subdomain
• Let’s tune Postfix configuration to pipe all
incoming mails sent to reply.odesk.com into
new script as STDIN
• …
• PROFIT!
15. E-Mail address generation
• Requirements to address syntax
– fixed-size string
– hashing, i.e. completely different emails on one-
symbol change in username
– [a-z0-9]+@reply.odesk.com
– http://en.wikipedia.org/wiki/Email_address#Syntax
• What should be encrypted
– Reply-to address should be unique per
combination of recipient_id, thread_id and post_id
– Each of 3 params is 4-bytes unsigned int
• All above is for ‘Reply-To’. ‘From’ is different!
16. sub EncryptThreadRecipient {
my ($thread_id, $post_id, $recipient_id) = @_;
# 1. Prepare raw encryption unit (12 bytes)
my $raw_unit = pack("LLL", $thread_id, $post_id, $recipient_id);
# 2. Create 16-bytes sequence to encrypt
# 16 = 12:source + 3:checksum + 1:random_salt
my $to_crypt = $raw_unit . _get_checksum($raw_unit) .
pack("C", int(rand(256)));
# 3. Encrypt 16-bytes sequence by AES
my $cipher = _get_cipher();
return lc MIME::Base32::encode($cipher->encrypt($to_crypt));
}
Get w{26}@reply.odesk.com
(MIME::Base32 + Crypt::OpenSSL::AES)
17. What incoming mails do we block (69)
• Automatically-generated E-Mails from robots
– Including different kind of out-of-office responders
• E-Mails addressed to mc-w{10}@reply.odesk.com (From),
instead of w{26}@reply.odesk.com (Reply-To)
• More than 2 E-Mails per 2 minutes having the same values of
'From:' and 'Subject:' field
(primary as auto-replies ping-pong protection, secondary as spam
protection)
• More than 30 E-Mails per 30 minutes to the same MC thread
• Any E-Mail reply to MC Thread which already has >= 500 posts
• More than 5 replies to the same w{26}@reply.odesk.com
• E-Mail replies from suspended oDesk accounts
18. E-Mail content processing
• use MIME::Parser;
• Extract part with Content-Type: text/html,
or with text/plain
• Decode charset of Content-Type
• Decode attachments if any
• $reply_text = HTML::FormatText->format_string(
$reply_text,
leftmargin => 0, rightmargin => 65535,
)
• Trim quoted part (one more bicycle)
20. First problems
• User has no idea whether e-mail was accepted or not
• User has no idea why e-mail was rejected
• w{26}@reply.odesk.com addresses are flooding
user’s mail addressbooks
– But viruses like it and use it
– Even LinkedIn-generated invitations use it
• And it is hard for us to add new blocking rules
• Lack of logging leads to hell of tracking
• Heavy load, or too many rejected spam
22. Simplify tracking with exit codes
• ./mcreply.pl < input.txt
• echo $?
– use constant EX_SUCCESS => 0;
– use constant EX_TEMPFAIL => 75;
– use constant EX_UNAVAILABLE => 69;
23. Complexity of tracking
• How to track E-Mails lost before ./mcreply.pl ?
– Lost in internet
– Lost due to oDesk downtime
– Lost due to Postfix misconfiguration
– Lost due to ./mcreply.pl misconfiguration
• Log every e-mail? Or headers only?
Or meta-info only?
– Question of size
– Question of log lifetime
26. Anybody wanna Viagra pills?
1. Let’s sell Viagra to bob@gmail.com using oDesk
2. Compose and send E-Mail having fields:
– From: bob@gmail.com
– To: any w{26}@reply.odesk.com
– Subject: You should buy Viagra pills!
3. E-Mail is rejected by mcreply.pl script
4. Not delivered notification goes from odesk.com to
bob@gmail.com, having original message attached
5. Bob is offered to buy Viagra when checking his mail
6. …
7. PROFIT!
27. oDesk is not a spam relay anymore
– use constant EX_SUCCESS => 0;
– use constant EX_TEMPFAIL => 0;
– use constant EX_UNAVAILABLE => 0;
• Anybody has any better idea?