5. Message- Central Data Channel
Game
Server 1
Logs
Hadoop
...
User
Tracking
SecuritySearch
Social
Graph
Rules/
Recomm
endation/
Engine
VerticaOps
Data
Warehouse
Message Queue
6. Topic-
A String representing Message Stream Id
Partition-
Logical Division per topic-level within Broker, for
writing logs generated by producer.
e.g: kafkaTopic - kafkaTopic1, kafkaTopic2
Replica-Sets-
Replica within Broker for a certain partition, with
Leader(writes) and follower (read) balanced using hash-
key modulu.
Kafka Jargon
11. Latency vs Durability
Ack Status TIme to publish (ms) Tradeoff
No Ack 0.7 Greater data loss
Wait for ack 1.5 Lesser data loss
12. 1. Create a Kafka-Replica Sink
2. Feed data to Copier via Kafka-Sink
3. Benchmark Kakfa-Copier-Vertica Pipeline
4. Improve/Refactor for Performance
Integration Plan
13. C- Consistency (Producer Sync Mode)
A- Availability (Replication)
P- Partition Tolerance (Cluster in same
network- No Network delay)
Strongly Consistent and
Highly Available
CA-P theorem
16. Leader :
➢ Message is propagated to follower
➢ Commit offset is checkpointed to disk
Follower failure and Recovery:
➢ Kicked out of ISR
➢ After restart, truncates log to last commit
➢ Catches up with leader → ISR
Error Handling: Follower Failure
17. ➢ Embedded Controller via ZK detects
leader failure
➢ Leader election from ISR
➢ Committed message not lost
Error Handling: Leader Failure
18. - Horizontally scalable
Add partitions in Broker Cluster as higher
throughput needed (~10Mb/s /server)
- Balancing for producer and consumer
Scalability
19. • Number of messages the consumer lags behind the producer by
• Max lag in messages btw follower and leader replica
• Unclean leader election rate
• Is controller active on broke
Monitoring via JMX
20. 3 Node cluster ~ 25 MB/s, < 20 ms latency
(end2end)
having replication factor of 3 with 6
consumer group
Monthly operations
$500 + $250 (zookeeper 3 node) + 1 mm
Operation Costs
21. - No callback in producer’s send()
- Fully automatic balancing till script is
manually run- Balancing layer of topics
via ZK
Nothing is perfect
22. • Infinite scaling with 10Mb/s /server with
< 10 ms latency
• Costs <$1000 /month + 1 man-month
• No impact on current pipeline
• 0.8 final release due in days to be used
in production
Summary