I’ve learned how to build distributed systems the hard way; I’ve failed, and failed again. I’ve made many of the common mistakes and tried a few other things that turned out to be a disappointment. You shouldn't have to make those mistakes too. In this talk I'll tell the story of how I built a real time advertising analytics platform that tracks and reports on millions of impressions every day, and all the things I did wrong before I got it to work. I’ll also tell you what I did right, and the choices I don’t regret.
9. TRACKING
AD IMPRESSIONS
track page views and all their ads
track visibility and send updates on changes
track events, track activity, sync cookies,
and track visits
10. LOADED
VISIBLE
HIDDEN
LOADED
VISIBLE
track page views and all their ads
track visibility and send updates on changes
track events, track activity, sync cookies,
and track visits
11. ASSEMBLING
SESSIONS
assemble ad impressions, page views and visits,
to be able to calculate things like total visible duration
mix in demographics, revenue, and third-party data
12. WAS
HIDDEN
BECAME {
A CLICK! "user_id": "M9L6R5TD0YXK",
ACTIVE "session_id": "MAI3QAGNAIYT",
"timestamp": 1347896675038,
"placement_name": "example",
"category": "frontpage",
"embed_url": "http://example.com/",
"visible_duration": 1340
"browser": "Chrome",
"device_type": "computer",
BECAME BECAME
"click": true,
"ad_dimensions":"980x300"
WAS VISIBLE }
VISIBLE
LOADED
AGAIN
3rd PARTY DATA &
OTHER GOODIES
assemble ad impressions, page views and visits,
to be able to calculate things like total visible duration
mix in demographics, revenue, and third-party data
53. DO YOU REALLY
NEED A BACKUP?
if you got 3x replication over multiple
availability zones, is that backup really worth it?
54. PRODUCTION IS THE
ONLY REAL TEST
ENVIRONMENT
when thousands of things happen every second,
new, weird and unforeseen things happen all the time,
no test can anticipate everything
(but testing is good anyway, just don’t think you got everything covered)