An Introduction to Zookeeper

2 432 vues

Publié le

This presentation deals with Zookeeper, an Apache effort to develop and maintain a highly reliable server with distributed coordination capabilities.

Publié dans : Technologie
0 commentaire
3 j’aime
  • Soyez le premier à commenter

Aucun téléchargement
Nombre de vues
2 432
Sur SlideShare
Issues des intégrations
Intégrations 0
Aucune incorporation

Aucune remarque pour cette diapositive
  • ZooKeeper is a high-performance coordination service for distributed applications. It exposes common services - such as naming, configuration management, synchronization, and group services - in a simple interface so you don't have to write them from scratch. You can use it off-the-shelf to implement consensus, group management, leader election, and presence protocols. And you can build on it for your own, specific needs.

  • ZooKeeper is a high-performance coordination service for distributed applications. It exposes common services - such as naming, configuration management, synchronization, and group services - in a simple interface so you don't have to write them from scratch. You can use it off-the-shelf to implement consensus, group management, leader election, and presence protocols. And you can build on it for your own, specific needs.

  • Name service— A name service is a service that maps a name to some information associated with that name. A telephone directory is a name service that maps the name of a person to his/her telephone number. In the same way, a DNS service is a name service that maps a domain name to an IP address. In your distributed system, you may want to keep a track of which servers or services are up and running and look up their status by name. ZooKeeper exposes a simple interface to do that. A name service can also be extended to a group membership service by means of which you can obtain information pertaining to the group associated with the entity whose name is being looked up.
    Locking— To allow for serialized access to a shared resource in your distributed system, you may need to implement distributed mutexes. ZooKeeper provides for an easy way for you to implement them.
    Synchronization— Hand in hand with distributed mutexes is the need for synchronizing access to shared resources. Whether implementing a producer-consumer queue or a barrier, ZooKeeper provides for a simple interface to implement that. Check out the examples on the Apache ZooKeeper wiki on how to do so (see Resources).
    Configuration management— You can use ZooKeeper to centrally store and manage the configuration of your distributed system. This means that any new nodes joining will pick up the up-to-date centralized configuration from ZooKeeper as soon as they join the system. This also allows you to centrally change the state of your distributed system by changing the centralized configuration through one of the ZooKeeper clients.
    Leader election— Your distributed system may have to deal with the problem of nodes going down, and you may want to implement an automatic fail-over strategy. ZooKeeper provides off-the-shelf support for doing so via leader election.
  • ZooKeeper, while being a coordination service for distributed systems, is a distributed application on its own. ZooKeeper follows a simple client-server model where clients are nodes (i.e., machines) that make use of the service, and servers are nodes that provide the service. A collection of ZooKeeper servers forms a ZooKeeper ensemble. At any given time, one ZooKeeper client is connected to one ZooKeeper server. Each ZooKeeper server can handle a large number of client connections at the same time. Each client periodically sends pings to the ZooKeeper server it is connected to let it know that it is alive and connected. The ZooKeeper server in question responds with an acknowledgment of the ping, indicating the server is alive as well. When the client doesn't receive an acknowledgment from the server within the specified time, the client connects to another server in the ensemble, and the client session is transparently transferred over to the new ZooKeeper server.
  • ZooKeeper has a hierarchal name space, much like a distributed file system. The only difference is that each node in the namespace can have data associated with it as well as children. It is like having a file system that allows a file to also be a directory. Paths to nodes are always expressed as canonical, absolute, slash-separated paths; there are no relative reference. Any unicode character can be used in a path subject to the following constraints:
    The null character (\u0000) cannot be part of a path name. (This causes problems with the C binding.)
    The following characters can't be used because they don't display well, or render in confusing ways: \u0001 - \u0019 and \u007F - \u009F.
    The following characters are not allowed: \ud800 -uF8FFF, \uFFF0-uFFFF, \uXFFFE - \uXFFFF (where X is a digit 1 - E), \uF0000 - \uFFFFF.
    The "." character can be used as part of another name, but "." and ".." cannot alone be used to indicate a node along a path, because ZooKeeper doesn't use relative paths. The following would be invalid: "/a/b/./c" or "/a/b/../c".
    The token "zookeeper" is reserved.

    znodes maintain a stat structure that includes version numbers for data changes, acl changes. The stat structure also has timestamps. The version number, together with the timestamp, allows ZooKeeper to validate the cache and to coordinate updates. Each time a znode's data changes, the version number increases. For instance, whenever a client retrieves data, it also receives the version of the data. And when a client performs an update or a delete, it must supply the version of the data of the znode it is changing. If the version it supplies doesn't match the actual

    ZooKeeper was not designed to be a general database or large object store. Instead, it manages coordination data. This data can come in the form of configuration, status information, rendezvous, etc. A common property of the various forms of coordination data is that they are relatively small: measured in kilobytes. The ZooKeeper client and the server implementations have sanity checks to ensure that znodes have less than 1M of data, but the data should be much less than that on average. Operating on relatively large data sizes will cause some operations to take much more time than others and will affect the latencies of some operations because of the extra time needed to move more data over the network and onto storage media. If large data storage is needed, the usually pattern of dealing with such data is to store it on a bulk storage system, such as NFS or HDFS, and store pointers to the storage locations in ZooKeeper.
  • Apache Hadoop relies on ZooKeeper for automatic fail-over of Hadoop HDFS Namenode and for the high availability of YARN ResourceManager.
    Apache HBase, a distributed database built on Hadoop, uses ZooKeeper for master election, lease management of region servers, and other communication between region servers.
    Apache Accumulo, another sorted distributed key/value store is built on top of Apache ZooKeeper (and Apache Hadoop).
    Apache Solr uses ZooKeeper for leader election and centralized configuration.
    Apache Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications. Mesos uses ZooKeeper for fault-tolerant replicated master.
    Neo4j is a distributed graph database that uses ZooKeeper for write master selection and read slave coordination.
    Cloudera Search integrates search functionality (via Apache Solr) with Apache Hadoop using ZooKeeper for centralized configuration management.
  • An Introduction to Zookeeper

    1. 1. Apache Zookeeper
    2. 2. What is a Distributed System? A distributed system consists of multiple computers that communicate through a computer network and interact with each other to achieve a common goal.
    3. 3. What is Wrong Here ?
    4. 4. Manual Coordination
    5. 5. Automated Coordination
    6. 6. Coordination in a Distributed System ● Coordination: An act that multiple nodes must perform together. ● Examples: o Group membership o Locking o Leader Election o Synchronization. o Publisher/Subscriber Getting node coordination correct is very hard!
    7. 7. Introducing ZooKeeper ZooKeeper is a Distributed Coordination Service for Distributed Applications. ZooKeeper allows distributed processes to coordinate with each other through a shared hierarchical name space of data registers. - ZooKeeper Wiki
    8. 8. What is ZooKeeper ? ● An open source, high-performance coordination service for distributed applications. ● Exposes common services in simple interface: o naming o configuration management o locks & synchronization o group services … developers don't have to write them from scratch ● Build according to your requirement for specific needs.
    9. 9. ZooKeeper Use Cases ● Configuration Management o Cluster member nodes bootstrapping configuration from a centralized source in unattended way o Easier, simpler deployment/provisioning ● Distributed Cluster Management o Node join / leave o Node status in real time ● Naming service – e.g. DNS ● Distributed synchronization - locks, barriers, queues. ● Leader election in a distributed system. ● Centralized and highly reliable (simple) data registry.
    10. 10. The ZooKeeper Service ● ZooKeeper Service is replicated over a set of machines ● All machines store a copy of the data (in memory) ● A leader is elected on service startup ● Clients only connect to a single ZooKeeper server & maintains a TCP connection. ● Client can read from any Zookeeper server, writes go through the leader & needs majority consensus.
    11. 11. The ZooKeeper Data Model • ZooKeeper has a hierarchical name space. • Each node in the namespace is called as a ZNode. • Every ZNode has data (given as byte[]) and can optionally have children. • ZNode paths: – canonical, absolute, slash-separated – no relative references. – names can have Unicode characters • ZNodes • Maintain a stat structure with version numbers for data changes, ACL changes and timestamps. • Version numbers increases with changes • Data is read and written in its entirety / Zoo Duck Goat Cow 3 1 2 3 /Zoo/D uck /Zoo/G oat /Zoo/C ow
    12. 12. ZNode Types ● Persistent Nodes o exists till explicitly deleted ● Ephemeral Nodes o exists as long as the session is active o can’t have children ● Sequence Nodes (Unique Naming) o append a monotonically increasing counter to the end of path o applies to both persistent & ephemeral nodes
    13. 13. ZNode Operations All these operations can be sync as well as async
    14. 14. ZNode Watches ● Clients can set watches on znodes: o NodeChildrenChanged o NodeCreated o NodeDataChanged o NodeDeleted ● Changes to a znode trigger the watch and ZooKeeper sends the client a notification. ● Watches are one time triggers. ● Watches are always ordered. ● Client sees watched event before new znode data. ● Client should handle cases of latency between getting the event and sending a new request to get a watch.
    15. 15. ZNode Reads & Writes ● Read requests are processed locally at the ZooKeeper server to which the client is currently connected ● Write requests are forwarded to the leader and go through majority consensus before a response is generated.
    16. 16. Consistency Guarantees ● Sequential Consistency: Updates are applied in order ● Atomicity: Updates either succeed or fail ● Single System Image: A client sees the same view of the service regardless of the ZK server it connects to. ● Reliability: Updates persists once applied, till overwritten by some clients. ● Timeliness: The clients’ view of the system is guaranteed to be up-to-date within a certain time bound. (Eventual Consistency)
    17. 17. Example #1: Cluster Management • Each Client Host i, i:=1 .. N • Watch on /members • Create /members/host- ${i} as ephemeral nodes • Node Join/Leave generates alert • Keep updating /members/host-${i} periodically for node status changes • (load, memory, CPU etc.)
    18. 18. Example #2: Leader Election ● A znode, say “/svc/election-path" ● All participants of the election process create an ephemeral-sequential node on the same election path. ● The node with the smallest sequence number is the leader. ● Each “follower” node listens to the node with the next lower seq. number ● Upon leader removal go to ● election-path and find a new leader, ● or become the leader if it has the lowest sequence number. ● Upon session expiration check the election state and go to election if needed
    19. 19. ZooKeeper In Action @Twitter • Used within Twitter for service discovery • How? • Services register themselves in ZooKeeper • Clients query the production cluster for service “A” in data center “XYZ” • An up-to-date host list for each service is maintained • Whenever new capacity is added the client will automatically be aware • Also, enables load balancing across all servers. Reference:
    20. 20. A few points to remember ● Watches are one time triggers ● Continuous watching on znodes requires reset of watches after every events / triggers ● Too many watches on a single znode creates the “herd effect” - causing bursts of traffic and limiting scalability ● If a znode changes multiple times between getting the event and setting the watch again, carefully handle it! ● Keep session time-outs long enough to handle long garbage-collection pauses in applications. ● Dedicated disk for ZooKeeper transaction log
    21. 21. No Questions? Awesome !!!
    22. 22. Thank You!