Zookeeper is a coordination tool to let people build distributed systems easier. In this slides, the author summarizes the usage of zookeeper and provides Kazoo Python library as example.
4. A Distributed System - Master-Worker
• Coordination tasks:
1. elect new master when the master crashes
2. master assign tasks to worker
3. when worker crashes, re-assign the task to other
worker
4. When worker finished their task, master assign new
tasks to it
Master
Worker Worker Worker Worker Worker Worker
4
5. Distributed System
• An application consists of programs run on a
group of computers.
• Coordination is more difficult than writing a
standalone program.
• Developer may take too much times to handle
the coordination or create a fragile (e.g. race
condition, single point failure) distributed system.
5
6. Easy Distributed System by Zookeeper
• Common coordination tasks:
• Naming service
• Configuration management
• Synchronization
• Leader election
• Message queue
• Notification system
• Zookeeper provides highly reliable API for those
common coordination tasks
http://en.wikipedia.org/wiki/Apache_ZooKeeper#Typical_use_cases
6
7. Powered By Zookeeper
• Zookeeper is built by Yahoo Research
• Customers:
• Hadoop, Hbase
• Solr
• Neo4j
• Flume
• Facebook messages
7
8. Benefits of Zookeeper
• With Zookeeper:
• simplify the development of distributed
system, more agile and robust
• zookeeper is simple, fast and replicated
• Without Zookeeper:
• more difficult
8
9. • Servers replicate data
• Client connect to one of the
server
• Throughput test
• Hardware: dual 2Ghz Xeon and
two SATA 15K RPM drives
Benefits of Zookeeper
9
11. Znode (1/2)
• Based on shared storage
model, each client store/
acquire data from
zookeeper service
• File system-like API
• Znode: hierarchical tree
contains optional data or
optional znodes.
• Persistent znode will
disappear after delete
operation
• Ephemeral znode will
disappear when the
client creator crashes or
close the connection, or
deleted by any client
11
12. Znode (2/2)
• Sequential znode will
be assigned a
monotonically
increasing integer at
the end of path. E.g. /
path-1, /path-2
• Versions: each node
have a version and
will be increased
when its data
changes
12
14. Notification
• set a watch on a znode operation (getData,
getChildren, exist) and then get the notification
when there is a change at the target
• Watch is:
• one-time trigger
• with ordering guarantee: all the event received
in client side will preserve the order of time
14
15. Session
• Session: client create a session connection
to one of the server and start operations
• Session states:
• connecting
• connected
• closed
• not_connected
15
16. Example - implement a lock
• Spec: n clients try to get the lock at the same
time, but only one of them can get the lock.
• Solution: clients try to create a ephemeral
znode e.g. /lock. the first one will get the lock
and the rest of them which fail to create the
znode set up a watch to know when the lock
is released and then try to acquire again.
16
17. Example - implement master-worker
• Spec:
• client submit tasks
• master watches for new workers and tasks,
assign tasks to available workers
• backup master takes over when the master fails
• workers register themselves and then watch for
new tasks
17
18. Example - implement master-worker
• Solution:
• ephemeral znode /master for master election
• backup masters sets up a watch for /master
• persistent znode /workers
• master set up with for /workers
• worker create a znode in /workers, e.g. /workers/host1
• persistent sequential znode /tasks
• client submit tasks by creating znode under /tasks
• persistent znode /assign
• workers set up watch on their corresponding znode under /assign e.g. /assign/
host1
• master assign task to worker by create znode under /assign, e.g. /assign/host1/
task1
• worker mark the task as done by update the data of task as “done”
18
20. Zookeeper Server Run Modes
• Standalone: single server
• Quorum: multiple servers replicate the data
• the cluster apply majority vote to keep the
consistency so a cluster can afford less than
half of nodes crash
• default ports: client(2181), quorum(2182),
election(2183)
20
21. Clients
• Native primitive operations
• C library
• Java library
• Recipes (3rd party high level API)
• Java: Curator (by Netflix)
• Python: kazoo (by Mozilla and Zope)
21
22. Java Client Console
• bin/zkCli.sh -server 127.0.0.1:2181
• Commands
• get path [watch]
• ls path [watch]
• set path data [version]
• createpath data acl
• delete path [version]
• setquota -n|-b val path
22
26. Common Recipes
• lock
• election
• counter
• barrier
• partitioner
• party
• queue
• watch
26
27. Lock
zk = KazooClient()
lock = zk.Lock("/lockpath", "my-identifier")
with lock: # blocks waiting for lock acquisition
# do something with the lock
lock.release()
27
28. Election
zk = KazooClient()
election = zk.Election("/electionpath", "my-identifier")
# blocks until the election is won, then calls
# my_leader_function()
election.run(my_leader_function)
28
31. Partitioner
from kazoo.client import KazooClient
client = KazooClient()
qp = client.SetPartitioner(
path='/work_queues', set=('queue-1', 'queue-2', 'queue-3'))
while 1:
if qp.failed:
raise Exception("Lost or unable to acquire partition")
elif qp.release:
qp.release_set()
elif qp.acquired:
for partition in qp:
# Do something with each partition
elif qp.allocating:
qp.wait_for_acquire() 31
32. Party
party1 = zk.Party("/party1", "my-identifier")
party2 = zk.Party("/party2", "my-identifier")
party1.join()
"my-identifier" in party1
"my-identifier" not in party2
32