5. What we need
• Asynchronous Job processing System
Collect Data
Processing Data
Response
Collect Data
Processing DataResponse
Workers
6. What we need
• Asynchronous Job processing System
• Batch Job
• Big data job
• High Reliable: No job missed
• Distributed job processing workers
• High performance
• Persistent
• Load balancing, Failed over, Recoverable
7. Open-source solutions
• Share-memory workers
• All workers in one physical server
• No fail-over
• Un-scalable
• Gearman
• Good but not completely fit our requirement
• No Batch Job support
• Not full reliable (lost job)
• Not full load-balance
• Un-stable if more than 2000 jobs/second
8. Zalo Asyn Job Processing
System
Client
Client
Worker 1
Worker 2
Worker 3
Z Database
Short Connection
Long Connection
TCP
TCP
Worker
Manager
Job
Caching
Job
Manager
Persistent
Manager
Job
Clean-Up
Job Server
TCP
TCP
TCP
9. Implementation
• C/C++ for Job Server
• C/C++, Java for client and workers
• Binary Protocol
• Z-Database
13. Applications
• Using for all Asynchronous job processing in Zalo: voice
upload, image upload, feed processing…
• Benchmark (single server)
• 50K images/seconds (640x480)
• 50k voices/seconds (30s)
• Advantages
• Batch Jobs
• Never lost job
• Worker can restart or stop any time
• Fail-over, Load Balancing, Quick recover in failure
• Issue
• Job duplication (handled by worker)