Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité

Consultez-les par la suite

1 sur 33 Publicité

Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022

Télécharger pour lire hors ligne

Starting with version 2.10, the Apache ZooKeeper dependency has been eliminated and replaced with a pluggable framework that enables you to reduce the infrastructure footprint of Apache Pulsar by leveraging alternative metadata and coordination systems based on your deployment environment. In this talk, walk through the steps required to utilize the existing etcd service running inside Kubernetes to act as Pulsar's metadata store, thereby eliminating the need to run ZooKeeper entirely, leaving you with a Zookeeper-less Pulsar.

Starting with version 2.10, the Apache ZooKeeper dependency has been eliminated and replaced with a pluggable framework that enables you to reduce the infrastructure footprint of Apache Pulsar by leveraging alternative metadata and coordination systems based on your deployment environment. In this talk, walk through the steps required to utilize the existing etcd service running inside Kubernetes to act as Pulsar's metadata store, thereby eliminating the need to run ZooKeeper entirely, leaving you with a Zookeeper-less Pulsar.

Publicité
Publicité

Plus De Contenu Connexe

Plus par StreamNative (20)

Plus récents (20)

Publicité

Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022

  1. 1. Pulsar Summit San Francisco Hotel Nikko August 18 2022 Tech Deep Dive Matteo Merli CTO • StreamNative Towards a ZooKeeper-less Pulsar
  2. 2. Co-creator of Pulsar PMC Chair for Apache Pulsar Member of Apache BookKeeper PMC Prev: Splunk, Streamlio, Yahoo Matteo Merli CTO StreamNative
  3. 3. Pulsar and Metadata
  4. 4. Pulsar cluster overview
  5. 5. Data Path
  6. 6. Metadata Path
  7. 7. Geo-Replication
  8. 8. What is metadata?
  9. 9. Examples of metadata ● Pointers to data ● Service discovery ● Distributed coordination ● System configuration ● Provisioning configuration
  10. 10. Topics metadata Each persistent topic is associated with a Managed Ledger Managed Ledger keeps a list of ledgers for a topic { "ledgers" : [ { "ledgerId" : 1234, "entries" : 1000, "size" : 433111, "offloaded" : false }, { "ledgerId" : 5579, "entries" : 50000, "size" : 9433111, "offloaded" : false } ], "schemaLedgers" : [ ], "compactedLedger" : { "ledgerId" : -1, "entries" : -1, "size" : -1, "offloaded" : false } }
  11. 11. Ledger metadata ● Each BK ledger has an associated metadata ● Contains ○ State of the ledger ○ Which bookies have the data LedgerMetadata{ formatVersion=3, ensembleSize=2, writeQuorumSize=2, ackQuorumSize=2, state=CLOSED, length=1738964, lastEntryId=1611 digestType=CRC32C, password=base64:, ensembles={ 0=[bookie-1:3181, bookie-2:3181], 1000=[bookie-5:3181, bookie-2:3181] }, customMetadata={ component=base64:bWFuYWdlZC1sZWRnZXI=, pulsar/managed-ledger=base64:cHVibGlR=, application=base64:cHVsc2Fy } }
  12. 12. Service discovery ● Find the list of available bookies ○ Which bookies are in read-only? ● Discover which broker owns a particular topic ● Find the list of available brokers ○ What is the current load on each broker?
  13. 13. Distributed coordination ● Acquire a lock over a particular resource ○ Ownership of group of topics ○ Signaling that some work on a particular resource is in progress ■ BK autorecovery ● Leader election ○ Establish a single leader designed to perform some tasks ■ Load manager designates a leader that ○ Failover to other available nodes
  14. 14. System configuration ● Allow for dynamic settings ● Features can be activated/deactivated without restarting brokers ● Keep isolation information ● Maintain tracking of (bookie → rack) mapping
  15. 15. Provisioning configuration ● Metadata for Tenants, Namespaces ● Policies to apply to namespaces ● Authorization definitions ● Highly-Cacheable metadata
  16. 16. What’s up with ZooKeeper?
  17. 17. ZooKeeper ● Consensus based “database” ○ Data is replicated consistently to a quorum of nodes ● It is not horizontally scalable ○ Increasing the ZK cluster size does not increase the write capacity ● All data is kept in memory in every node ○ Not very GC friendly ● It takes periodic snapshots of the entire dataset
  18. 18. ZooKeeper ● The amount of metadata that can be comfortably stored in ZK is ~5GB ● Tuning and operating ZK to work with big datasets is not trivial ○ Requires deep knowledge of ZK internals ● In cloud and containerized environments, leader election can sometime take few minutes: ○ Issues with DNS, software-defined-networking and sidecar TCP proxies.
  19. 19. Why we would take ZK away? ● Big clusters → we don’t want to have a hard limit of the amount of metadata ○ A horizontally scalable metadata store is more suited ● Small clusters → remove overhead of running ZK ○ Less components to deploy ○ Easier operations
  20. 20. PIP-45: A plan with multiple steps
  21. 21. PIP-45: Pluggable metadata backend ● Instead of direct usage of ZooKeeper APIs, we have abstracted all the accesses through a single generic API ● This API has multiple implementations: ○ ZooKeeper ○ Etcd ○ RocksDB (for standalone) ○ Memory (for unit tests)
  22. 22. Metadata semantics We have identified 2 main patterns of access to the metadata 1. Simple key-value access + notifications 2. Complex coordination
  23. 23. Key-Value access ● MetadataStore → Key-value store access ○ put() – get() – delete() ○ Values are byte[] ○ Users can register for notifications ● MetadataCache → Object cache on top of MetadataStore
  24. 24. CoordinationService ● Contains primitives for “cluster coordination” ● High-level API that hides all the complexities ○ ResourceLock – Distributed lock over a shared resource ○ LeaderElection – Elect a leader among a set of peers ○ DistributedCounter – Generate unique IDs
  25. 25. Successes enabled by MetadataStore APIs in Pulsar 2.10
  26. 26. Pulsar broker metadata session revalidation ● All the coordination is going through CoordinationService ● All the locks are using the ResourceLock When we lose a ZooKeeper session (or similarly an Etcd lease), we are able to re-validate it later, without having to restart Pulsar brokers. This is a major cluster stability improvement.
  27. 27. Transparent batching of metadata operations ● All the metadata read and write operations are happening through a single access point ● Accumulate operations into a queue and use underlying API for bulk access (eg: ZK “multi” or Etcd transactions)
  28. 28. What’s next?
  29. 29. ● Since Pulsar 2.10 one can choose the metadata service to use for both the local metadata and the configuration store: ○ ZooKeeper ○ Etcd ○ It’s possible to add custom implementations Current state of metadata in Pulsar
  30. 30. ● Transparent horizontal scalability ● Ease of operations (add/remove nodes) ● No need for global linearizable history ● Scale up to 100 GB of total data set ● Read - Write rates scalable to ~1M ops/s ● Latency target: reads 99pct < 5ms — writes 99pct < 20ms Defining the “perfect” metadata service
  31. 31. ● Let’s not implement directly in Pulsar brokers ● Let’s not rewrite Paxos/Raft again ● Assume the facilities of a cloud-native environment ● Design for auto-tuning, from tiny to huge without admin intervention Design decisions
  32. 32. ● Ultimate goal is to achieve a 10x increase in number of topics in a cluster ● A small Pulsar cluster should be able to support millions of topics ● Handling of metadata is the biggest obstacle ● It’s not the only factor though. We are also working on metrics, lookups, overhead of single topic and global memory limits What to expect
  33. 33. Matteo Merli Thank you! mmerli@streamnative.io mmerli@apache.org @merlimat Pulsar Summit San Francisco Hotel Nikko August 18 2022

×