This document discusses AcuityAds' use of Apache Kafka and Storm for processing over 10 billion daily ad impressions. It describes their architecture with Kafka used to ingest bid request data from multiple sources into partitions. Storm topologies read from Kafka and processed the data to calculate metrics like daily impressions by site. Initial issues included unbalanced Kafka partitions and low Storm uptime due to exceptions. Future improvements involved upgrading versions and adding monitoring capabilities.
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Target and Connect Intelligently with Kafka & Storm
1. Target and Connect Intelligently
Experience with Kafka & Storm
Otto Mok
Solution Architect, AcuityAds
April 30, 2014 – Toronto Hadoop User Group
2. 2
Agenda
• Background
– What does AcuityAds do?
• Use case
– What are we trying to do?
• High-level System Architecture
– How does the data flow?
• Kafka & Storm
– What did we do wrong?
4. 4
Background
• Digital Advertising
– Website banner, pre-roll video, free mobile app
• Buy ad impressions at ‘real-time’
– Response within 50ms for auction
• Find best match between people and ads
– Show ad that you care about
• Use machine learning algo to ‘learn’
– Data, data, data
5. 5
Use case
• 10+ billion daily impressions
• 30,000+ new sites daily
• How many daily impressions by site?
• How are the impressions distributed?
– Country, Province, Gender, Age Range, etc...
6. 6
High-level System Architecture
• 10+ billion daily bid requests
• Make up to 4 billion daily bids
• Serve millions of daily impressions
• 10+ TB of messages daily
• 300k+ message / second
Bidder Adserver
Kafka
Hbase/Hadoop
Storm
11. 11
Kafka - Issues
• Issue 1 - Partitions
– 10 partitions
– Each partition > 1 TB a day
– 100 TB / 1 TB – no problem!
• Each partition is stored in a directory
– /disk05/kafka-logs/BIDREQUEST-09
– /disk09/kafka-logs/BIDREQUEST-03
12. 12
Kafka - Issues
• Issue 2 – Unbalanced partition distribution
– Some servers running out of space
– Some servers are not “leader” for any partition
• Network glitch cause server to drop out of
cluster, no longer leader after rejoin
• auto.leader.rebalance.enable=true
13. 13
Lots of data – now what?
Source: http://bookriotcom.c.presscdn.com/wp-content/uploads/2013/03/server-farm-shot.jpg
14. 14
Use case - again
• 10+ billion daily impressions
• 30,000+ new sites daily
• How many daily impressions by site?
• How are the impressions distributed?
– Country, Province, Gender, Age Range, etc...
18. 18
Storm - Topology
• Spout read each BidRequest from Kafka topic
• Determine new or existing, emit tuples to
different “streams”
19. 19
Storm - Topology
• InsertInventoryBolt
– Process tuples from NewInventory stream
– Field grouping on sourceId, domainName
– Tick tuple every 1 second
• UpdateInventoryBolt
– Process tuples from ExistingInventory stream
– Field grouping on inventoryId
– Tick tuple every 1 second
20. 20
Storm - Topology
• LogInventoryBolt
– Process tuples from ExistingInventory stream
– Field grouping on inventoryId
– Tick tuple every 10 seconds
21. 21
Storm - Issues
• Issue – Low uptime
– 10 workers, 100 executors
– Not processing many tuples
– Process latency < 10ms
• Bolts restarts due to uncaught Exceptions
22. 22
Conclusion
• Cost
– Bleed edge technology bugs
– Support mailing lists
– Monitoring roll your own
– Operation dedicated personnel
• Benefit
– Near real-time data on site impression volume &
distribution by geo, demo, etc...
23. 23
Forward Looking
• Kafka v0.8.1.1
– Allow specify broker hostname for producer &
consumer
– Change # of partitions of a topic online
• Storm v0.9.1
– Faster pure Java Netty transport
– View logs from each server from Storm UI
– Tick tuple using floating point seconds
– Storm on Hadoop (HDP 2.1)