Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

How Zhaopin built its Event Center using Apache Pulsar

413 vues

Publié le

Zhaopin.com is a Chinese online recruitment services provider. As a bilingual job board, Zhaopin.com has one of the largest selections of job vacancies in China, including both prominent local and foreign companies. It has over 2.2 million clients, and its average daily page views are over 68M.

Zhaopin.com has used RabbitMQ for years as its enterprise event bus. As the company grew, the amount of data grew, and the use cases varied widely. Because of this, the original RabbitMQ-based architecture was hard to scale, maintain, and operate. Ultimately, Zhaopin.com chose Apache Pulsar to replace RabbitMQ based architecture in 2018.

Sijie Guo and Penghui Li describe the advantages of Pulsar over RabbitMQ and explain how Pulsar meets the requirements of high durability, high throughput, and low latency. Finally, they detail how Pulsar was put into production at Zhaopin.com and share use cases and their experience operating Pulsar at scale.

Publié dans : Technologie

How Zhaopin built its Event Center using Apache Pulsar

  1. 1. How Zhaopin built its Event Center using Apache Pulsar Penghui Li Sijie Guo
  2. 2. Zhaopin.com Zhaopin.com is the biggest online recruitment service provider in China Zhaopin.com provides job seekers a comprehensive resume service, latest employment, and career development related information, as well as in-depth online job search for positions throughout China Zhaopin.com provides professional HR services to over 2.2 million clients and its average daily page views are over 68 million.
  3. 3. Who are we Penghui Li -Tech lead of infrastructure team at zhaopin.com -5+ years of experiences developing message queues and microservices -Apache Pulsar Committer
  4. 4. Who are we Sijie Guo -Apache Pulsar Committer & PMC Member -Apache BookKeeper Committer & PMC Member -Interested in technologies around Event Streaming -Worked for Twitter and Yahoo before
  5. 5. 1. Why building an Event Center 2. Why Apache Pulsar 3. Apache Pulsar at Zhaopin 4. Streaming Platform 5. Zhaopin’s contributions to Apache Pulsar
  6. 6. Why building an Event Center Data Silos -> Unified Platform
  7. 7. Data Silos To Enterprises MSMQ To End Users RabbitMQ Data Processing Kafka • High Maintenance Cost • Extremely hard to share data cross teams • Inconsistency between data silos • Doesn’t Scale • No consistent SLA Pain Points
  8. 8. Data Silos To Enterprises MSMQ To End Users RabbitMQ Data Processing Kafka • High Maintenance Cost • Extremely hard to share data cross teams • Inconsistency between data silos • Doesn’t Scale • No consistent SLA Pain Points
  9. 9. Unification - MQService Thrift RabbitMQ RabbitMQ RabbitMQ HTTP MQTT Submission ServiceResume ServiceJob Search MQService RabbitMQ RabbitMQ • Simplified Operations • Scale-out Service • High availability Problems Solved: • Keep messages for longer period • Data rewind • Order Guarantee Problems Unsolved:
  10. 10. Unification - MQService Online Services MQService Data Processing Kafka
  11. 11. 0 Consumer-1 Consumer-2 Consumer-3 New consumer 0 Queue Partition-0 Partition-1 Partition-2 1 2 3 0 1 2 3 1 2 3 0 1 2 3 0 1 2 3 Consumer-1 0,1,2,3 Consumer-1 Consumer-1 New consumer 0,1,2,3 0,1,2,3 Better consumption parallelism Better order guarantee Why Building an Event Center
  12. 12. Why Building an Event Center RabbitMQ is better for work queue use cases, more consumers can increase consumption. Kafka need more partitions to increase consumption. We used RabbitMQ a lot for work queue use cases.
  13. 13. Why Building an Event Center Kafka integrates well with the data processing ecosystem (Flink, Spark), and provides high throughput. We used Kafka a lot for data processing.
  14. 14. Why Building an Event Center The cost of operating two different message systems is high Data sits at two different silos But We need a unified platform to handle both scenarios
  15. 15. Why Apache Pulsar Pulsar == Messaging + Storage
  16. 16. What is Apache Pulsar “Flexible Pub/Sub messaging backed by durable log/stream storage”
  17. 17. Apache Pulsar - Multi Tenancy
  18. 18. Apache Pulsar - Queue + Streaming
  19. 19. Apache Pulsar - Cloud Native • Independent Scalability • Instant Failure Recovery • Balance-free on cluster expansions Layered Architecture
  20. 20. Why Apache Pulsar 1. Pulsar provides a better abstraction of consumption patterns 2. Pulsar provides better fault tolerance and consistency options 3. Pulsar uses a scalable storage system (Apache Bookkeeper) 4. Hierarchical topic management and resource isolation Perfect match with our requirement.
  21. 21. Apache Pulsar at Zhaopin 20+ core services, 6 billions msgs/day
  22. 22. Unification - Apache Pulsar Online Services Apache Pulsar • No Data Silos • Queue + Streaming • Disaster Recovery • Infinite Message Storage (via Tiered Storage) • Data rewinding Problem Solved: Data Processing Queue Streaming
  23. 23. Milestones POC 2018/07 2018/09 Pulsar on Production 2018/10 Pulsar based Event Center
 1 billion msgs/day 2018/11 Win the best innovative platform award at Zhaopin 2018/12 3 billion msgs/day 2019/02 6 billion msgs/day
  24. 24. Core Metrics 50+ Namespaces 3000+ Topics 6+ billion Messages per day 3TB Storage per day 20+ Core Services
  25. 25. System Metrics Latency 99.5% < 5msWrite 100K+/s Read 200K+/s Network In 190MB+/s Network Out 550MB+/s
  26. 26. Pulsar at Zhaopin 1. One copy of data, single source-of-truth. 2. Don’t worry about data consistency between RabbitMQ and Kafka 3. Multi-tenancy makes topic management easier 4. Strong data durability allows us to stop worrying about message loss
  27. 27. Streaming Platform Beyond an Event Center
  28. 28. Streaming Platform Pulsar S3 HiveFlink Pulsar SQL HDFS OSS Steaming Layer Tiered Storage
  29. 29. Stream to Stream Stream -> Table Table -> Stream Stream -> Stream Stream -> Stream Table -> Table
  30. 30. Unified Data Processing Hive Topic Topic Topic Topic Stream Processing
  31. 31. Contribute to Apache Pulsar
  32. 32. Zhaopin’s Contributions to Pulsar Client interceptors We use this feature to track message between producer and consumers Dead Letter Topic Time partitioned message tracker Service url provider We use this feature to dynamically switching traffic Hive Pulsar integration Muti-version Schema and more…
  33. 33. Thank you